The correct way to model User and Roles in SQL - sql

I'm designing a Java application and the model data is stored in Oracle SQL Server. I'm trying to design the best user/role model according to what is necessary.
Because of business rules all users have basic common information:
Identification ID
Name
Surname
Email
IsActiveUser
But then depending on the role, the user will have extra fields like:
Client Role:
Birth Date
Address
Lawyer Role:
Specialty
Professional Registration ID
Expert Role:
Occupation
Manager Role:
Region
I think in two possible solutions:
User table will have all the common fields and the optional fields that will be filled depending on the role.
User table will only have the common fields, and then I create a Detail_User table to save the optional fields that vary with the role.
Do you think this possible solutions are good? Is there an alternative better solution?

Answering in the Relational database paradigm, as you have tagged it.
Do you think this possible solutions are good? Is there an alternative better solution?
No. This is a classic case for Subtypes.
I have a Role table and the User table have a FK to this, because every user will have only one role.
That won't solve your problem. You need to store the values for each instance of a Role, each instance of an User.
Further, you will appreciate the correct solution only when you wish to constrain some child table (eg. Portfolio.LawyerId) to Lawyer, not User.
Data Model
The Data Model in IDEF1X/ER Level (not ERD) is:
Note
The Standard for Relational Data Modelling since 1983 is IDEF1X. For those unfamiliar with the Standard, refer to the short IDEF1X Introduction.
For full definition and usage considerations re Subtypes, refer to Subtype Definition.

Unless you are going to have zillions of rows, there is no need to split the tables in the two that you specify.
You might consider a separate table for each role -- or more specifically for each role that has bespoke columns. This would give you a little efficiency in storage space in many cases (versus NULL) in a wider record, although that depends on the database and data types being used.
A more important reason to split them is for foreign key references. If you have other tables where "lawyer" would be a foreign key, then you need a lawyers table for that. Voila! Having a separate table for different roles allows such specialized relationships, as well as general purpose relationships for all users.

Related

How to avoid defining two extra tables for an object that can have only 3 possible values and is part of a many-to-many relationship?

I tried to search this online but found this question quite difficult to formulate in a concise & intelligible way.
I am developing an application which enables users to choose from 3 types of authentications: Password, Finger Print & Face Recognition. Each user may opt for multiple types of these 3 and I need to store their picks in a relational database. So theoretically, there exists a many-to-many relationship between users and authentication_types.
I know this seems quite trivial and probably I am overanalysing things, but which would be the optimal way to model this at a relational database level? What I am trying to avoid but seems to be the only reasonable solution in a relational DB setting, is to create a table for login types (say LoginTypes) in which to store the 3 login types mentioned above and create an intermediary table for the many-to-many relationship (say UsersLoginTypes).
What's frustrating a little for me is that for only 3 types of login, I need to create one table to store them and another one for the many-to-many relationship. And any time I want to get the login types chosen by a user, I cannot simply select the user and extract the login types from the user's object, but I need to make a query that involves two another tables (LoginTypes & UsersLoginTypes). Do I miss a simpler solution here?
I thought of maybe assigning each login type a digit (eg. Password - 1, Fingerprint - 2, Face Recognition - 3) and have a field in the User's model for the login types, where to store a string containing the digits corresponding to what the user chose. And eventually, this is perhaps what I would go for if no better solution exists.
PS. I am using Ruby on Rails with ActiveRecord, if this changes something.
In 1997, I once normalised a relational database model to death. It worked, was extremely flexible, but it invariably ground to a halt whenever you wanted to formulate an unforeseen query. It was already very tedious to formulate the query in the first place. (of course, that was at times when you always wrote your SQL manually - BI tools were a thing of the future).
So: a (master) table users , a (lookup) table login_types and a (child/intermediate) table active_users_authentications as your first shot is the correct way of modelling it relationally.
But if you want the system to be efficient/performant (and you don't need any further details for the authentication configurations - which you would store in active_users_authentications, of course), I for one would find it absolutely legitimate to have 3 Booleans (Yes/No) columns in the users table and call them: has_pwd_auth, has_fgnpr_auth, and has_facerec_auth .

Is it a good practice to map UserAccount Table with all other tables in SQL Server?

I have UserAccount table and other tables like Employee, Student etc. I want to have an audit like who created a student record or who created a certain employee record. Is it a good practice to have UserAccountId as foreign key in all other tables like Employee, Student etc? I am using hibernate if I mapped like this I have to maintain one to many relationship between UserAccount and All other Classes so code increases and for me that is a burden.
Well it breaks all normalisation rules. Have a link/href table instead. UserAccountID, EmployeeID(NULL), StudentID(NULL). Have one massive linked table like this. The foreign Keys needs to be nullable besides UserAccountID(Primary Key and Foreign Key).
"Good habit/practice" is subjective.
If the business domain includes the fact that the person who created an entity is a meaningful piece of information, and that this is likely to be a regular request by end users, then adding a "createdBy" attribute to your tables/classes is, indeed, good practice.
The best way to know whether this is true is to ask the product owner whether they would need a screen showing "all employees created by user x". If they say "no, only if something goes wrong", you have an audit requirement; if they say "yes, we'll use that regularly", it's an integral part of your business domain.
You may find, that your users want to know not just who created a row, but also who modified it. In that case, there are similar questions on SO.

Creating an SQL Schema (postgresql)

I'm having problems creating a schema for a PostgreSQL project.
It's for a social networking site, if there is a profile, and each profile comes in three varieties: generic, education, and employment profiles, therefore each profile requires different attributes… how do we do this all in the one table?
create type ProfileTypeValue as enum
('generic', 'education', 'employment');
create Profiles (
id integer
type ProfileTypeValue
....?
primary key (id)
);
because for instance if it's an education profile, then we need to have institution name etc, or if it's an employment profile, then we need to have an employer name attribute, etc.
Is it best to just have 3 different tables, 1 for each profile Type, dont know if thats possible… but I feel like I need to have an if statement saying if it's profile, include these attributes, or if its a profile, include these attributes, etc.
Here's a couple of options
All in the same table
Common profile attributes in one table, profile type specific in their own tables with foreign key references to the common profile table
Inheritance
Key-Value store
All in the same table
In this option all the fields are always present whatever type the profile is. This is too easy to do the first time around as you only have to list all the columns. However, this is really bad design that will make your life harder in the long run, because the maintainability and extendability is poor. You should read up on database normal forms etc. Don't do this.
Master profile table and profile type dependent details on their own tables
In this option you will create a table for all profiles. This will include all the common attributes. This table will make sure the identifiers are all in the same namespace and each profile has a unique id. For each profile type you'll create a new table that has a foreign key reference to the master profile table. You can then select all employment profiles using an inner join on the employment profile table and the master profile table. This design allows you to create constraints for each profile type. Furthermore, this design lets you have profiles that are both employment and education profiles. You should probably do this.
Inheritance
Postges provides a facility for table inheritance. You can use this by creating a base table for all profile types and then creating child tables for each profile type. Each profile type then inherits all the attributes defined in the parent table. With inheritance you can select all profiles using the parent table and all employment profiles using the employment profile table. If generic profiles use only common attributes, they can be stored to the parent table.
The main disadvantage of inheritance in postgres is that parent table and the child tables do not share the same namespace. You cannot create a unique constraint that spans all the tables. This means that you have to make sure that the identifiers are globally unique some other way e.g. keeping a separate table for the profile identifiers.
You should think if the disadvantages of inheritance matter in you situation. However, this is the sensible way of doing separate tables for all profile types if you are using postgres as you don't have to duplicate the definitions of the common attributes.
Key-value store
You could also create a table for common profile attributes and keep the rest of the attribues in (profile, attribute, value)-tuples. By doing this, you'd discard the benefits of a RDBMS and you'd have to implement all the logic in you program. Don't do this.
PostgreSQL supports table level inheritance. You can make a Profile table as the parent table with common attributes and then separate child tables for education and employment with only attributes specific to those categories
Check out the PostgreSQL documentation here.

Should I include user_id in multiple tables?

I'm at the planning stages of a multi-user application where each user will only have access their own data. There'll be a few tables that relate to each other, so I could use JOINs to ensure they're accessing only their data, but should I include user_id in each table? Would this be faster? It would certainly make some of the queries easier in the long run.
Specifically, the question is about multiple tables containing the user_id field.
For example, each user can configure categories, items (in those categories), and sub-items against those items. There's a logical path from user, to sub-items through the other tables, but it would require 3 JOINs. Should I just include user_id in all the tables?
Thanks!
This is a design decision in multi-tenant databases. With "root" tables, obviously you have to have the user_id. But in the non-"root" tables, you do have a choice when you are using surrogate PKs.
Say you have users with projects and projects with actions. Projects obviously has to have a user_id, but if actions are tied to one and only one project, then the user_id is redundant, and also violates normal form, since if it was to move to another user's project (probably not likely in your use cases), both the project FK and the user FK would have to be updated. Typically in multi-tenant scenarios, this isn't really a possible scenario, and so the primary key of every table is really a combination of tenant and a unique primary key "within" the tenant (which may also happen to be globally unique).
If you use natural keys extensively in your design, then clearly tenant+natural key is necessary so that each tenant's natural keys can be used. It's only when using surrogates like IDENTITY or GUIDs or sequences, that this becomes an issue, since it is tempting to make the IDENTITY the PK, after all, it is unique by definition.
Having the user_id in all tables does allow you to do certain things in views to enhance security (defense in depth), giving you a little bit of defensive programming (in SQL Server you can restrict all access through inline table valued function - essentially parametrized views - which require the app to specify user_id on every "table" access), and also allows you to easily scale out to multiple databases by forklifting everything on shared keys.
See this article for some interesting insights.
(In a massively multi-parallel paradigm like Teradata, the PRIMARY INDEX determines the amp on which the data lives, so I would think that this is a must to stop redistribution of rows to the other amps.)
In general, I would say you have a tenantid in each table, it should be the first column in the table, in most indexes and should be part of the primary key in most cases, unless otherwise justified. Where possible, it should be a required parameter in most stored procedures.
Generally, you use foreign keys to relate data between tables. In many cases, this foreign key is the user id. For example:
users
id
name
phonenumbers
user_id
phonenumber
So yes, that'd make perfect sense.
If a category can only belong to one user then yes, you need to include the user_id in the category table. If a category can belong to multiple people then you would have a separate table that maps category IDs to user IDs. You can still do this if you have a one to one mapping between the two, but there is no real reason for it.
You don't need to include the user_id in further tables if you can guarantee that those child tables will always be accessed via joining to the category table. If there is a chance that you will access them independantly of the category table then you should also have the user_id on those tables.
The extent to which to normalize can be a difficult decision. One of the best StackOverflow answers on this topic (Database Development Mistakes Made by App Developers) warns against both (1) failing to normalize, and (2) over-normalizing.
You mention that it might be easier "in the long run" to repeat the same data in multiple tables (that is, not to normalize that data). Look at the "Not simplifying complex queries through views" topic in the previous link. If you use views effectively, you will only have to do the 3 join query once when writing the view and then you can use a query with no joins for most purposes.
Most developers tend to under-normalize because it seems simpler. Go ahead and normalize. Use views to simplify your daily queries. When your requiremens get more complex or you decide to add features, you will be glad that you put time into a relational database design.
Alternatively, depending on your toolset, you may want to use a database abstraction layer that does the relational design under the covers while you manipulate higher level data object.
if it is Oracle, then you would probably set up a fine grained security rule to do the joins and prevent certain activities based on the existence of the original user id... (SELECT INSERT UPDATE DELETE etc)
You would need a map between the logged in user and the user_id. You could use uid, but then remember this umber may change if the database is reconstructed after some disaster...

How can I automatically determining table(s) schema from a set of queries?

Is there any tool, which will take a set of CRUD queries, and generate a 'good enough'
table schema for that set:
e.g. I can provide input like this:
insert username, password
insert username, realname
select password where username=?
update password where username=?
update realname where username=?
With this input, tool should be able to make either 1 or 2 or 3 table, take care of _id's,
and indexing.
To put it alternatively, i'm looking for a tool, with which, i can design set of queries assuming a single infinite column table, and tool process and actually generates a number of database/tables/columns, and a high level language module with function calls to each of query.
oh yes , i'm trying to fire my db designer (-:
Have you considered using a ORM solution like Hibernate? This requires a inital set of mappings between the application class model (for example the User class) and the database schema representation (eg: USER table).
An ORM solution may supports advanced mapping scenarios where an object maps to more than one table in the schema. Also newer versions of Hibernate supports generating the database schema from the mappings (search for hbm2ddl tool).
You're asking for the impossible.
How would the tool know that username should have an index on it, much less a unique index?
How would it know the data types of the columns?
How would it know any domain constraints — for example, a hypothetical sex column must be either male or female, not crimson?
Wouldn't it be pretty vulnerable to typos, leaving you with a username and a user_name column?
Databases require design for a (well, many) reasons. Questions of normalization, for example, are going to be very difficult for a tool—which can't understand your problem domain—to answer.
That said, it isn't automatic, but what your asking for is—as Aleris answered—an ORM. You didn't specify which language you are using, but surely there is one (or more) for yours.