What is the best way to add users to multiple groups in a database? - sql

In an application where users can belong to multiple groups, I'm currently storing their groups in a column called groups as a binary. Every four bytes is a 32 bit integer which is the GroupID. However, this means that to enumerate all the users in a group I have to programatically select all users, and manually find out if they contain that group.
Another method was to use a unicode string, where each character is the integer denoting a group, and this makes searching easy, but is a bit of a fudge.
Another method is to create a separate table, linking users to groups. One column called UserID and another called GroupID.
Which of these ways would be the best to do it? Or is there a better way?

You have a many-to-many relationship between users and groups. This calls for a separate table to combine users with groups:
User: (UserId[PrimaryKey], UserName etc.)
Group: (GroupId[PrimaryKey], GroupName etc.)
UserInGroup: (UserId[ForeignKey], GroupId[ForeignKey])
To find all users in a given group, you just say:
select * from User join UserInGroup on UserId Where GroupId=<the GroupId you want>
Rule of thumb: If you feel like you need to encode multiple values in the same field, you probably need a foreign key to a separate table. Your tricks with byte-blocks or Unicode chars are just clever tricks to encode multiple values in one field. Database design should not use clever tricks - save that for application code ;-)

I'd definitely go for the separate table - certainly the best relational view of data. If you have indexes on both UserID and GroupID you have a quick way of getting users per group and groups per user.

The more standard, usable and comprehensible way is the join table. It's easily supported by many ORMs, in addition to being reasonably performant for most cases. Only enter in "clever" ways if you have a reason to, say a million of users and having to answer that question every half a second.

I would make 3 tables. users, groups and usersgroups which is used as cross-reference table to link users and groups. In usersgroups table I would add userId and groupId columns and make them as primary key. BTW. What naming conventions there are to name those xref tables?

It depends what you're trying to do, but if your database supports it, you might consider using roles. The advantage of this is that the database provides security around roles, and you don't have to create any tables.

Related

How to implement One-To-Many relationship with the same table?

I have a table called user. Now I am building an app where a user can have many users as friends. So I think I should create a new table called friends_list and implement one-to-many relationship where (user) is one and (friends_list) is many. Then to get the list of friends the user has, I could do select * from friends where userId = XXXx.
Is this the best approach? Or is there another better way to create a relationship with the same table?
Your approach is the best approach. You want to represent an n-m relationship (one user has many friends and vice versa).
There are some considerations. If "friendship" is symmetric (that is A friends B automatically means that B friends A), then you probably want to include both in the table when one is inserted. You may also want to prevent a user from self-friending.
If you want to retrieve user's friends as a list, you can do so using a string concatenation function. In Standard SQL, this is:
select listagg(friendid, ',') within group (order by friendid) as friendids
from friends
where userid = XXX;
Different databases have different function names for listagg().
An approach, and clearly NOT the best, would be to use an extra column to store a comma separated value of friend's user_ids.
If the answer to the question "who are my friends?" would not be too difficult to get, you'll need to rely on LIKE %XXXx% conditions and shouldnt expect very fast response times.
Another drawback would be some complexity with the relationship maintenance while editing the friend list.
Therefore, the two tables schema is both the most semantically correct and reliable one.

What is the most correct way to store a "list" in a SQL Database?

So, I've read a lot about how stashing multiple values into one column is a bad idea and violates the first rule of data normalisation (which, surprisingly, is not "Do Not Talk About Data Normalisation") so I need some help.
At the moment I'm designing an ASP .NET webpage for the place I work for. I want to display data on a web page depending on what Active Directory groups the person belongs to. The first way of doing this that comes to mind is to have a table with, essentially, a column containing the AD group and the second column containing what list of computers belong to that list.
I've learnt that this is showing great disregard for relational databases, so what is a better way to do it? I want to control this access by SQL tables, so I can add/remove from these tables and change end users access accordingly.
Thanks for the help! :)
EDIT: To describe exactly what I want to do is this:
We have a certain group of computers that need to be checked up on, however these computers are in physically difficult to reach locations. The organisation I belong to has remote control enabled for these computers, however they're not in the business of giving out the remote control password (understandable).
The added layer of complexity is that, depending on who you are, our clients should only be able to see a certain group of computers (that is, the group of computers that their area owns). So, if Group A has Thomas in it, and Group B has Jones in it, if you belong to either group then you would just see one entry. However, if you belong to both groups you should see both Thomas and Jones computers in it.
The reason why I think that storing this data in a SQL cell is the way to go is because, to store them in tables would require (in my mind) a new table for each new "group" of computers. I don't want to crank out SQL tables for every new group, I'd much rather just have an added row in a SQL table somewhere.
Does this make any sense?
You basically have three options in SQL Server:
Storing the values in a single column.
Storing the values in a junction table.
Storing the values as XML (or as some other structured data format).
(Other databases have other options, such as arrays, nested tables, and JSON.)
In almost all cases, using a junction table is the correct approach. Why? Here are some reasons:
SQL Server has (relatively) lousy string manipulation, so doing something as simple as ensuring a unique list is really, really hard.
A junction table allows you to store lots of other information (When was a machine added? What is the full description of the machine? etc. etc.).
Most queries that you want are pretty easy with a junction table (with the one exception of getting a comma-delimited list, alas -- which is just counterintuitive rather than "hard").
All the types are stored natively.
A junction table allows you to enforce constraints (both check and foreign key) on the elements of the list.
Although a delimited list is almost never the right solution, it is possible to think of cases where it might be useful:
The list doesn't change and presentation of the list is very important.
Space usage is an issue (alas, denormalization often results in fewer pages).
Queries do not really access elements of the list, just the entire thing.
XML is also a reasonable choice under some circumstances. In the most recent versions of SQL Server, this can be made pretty efficient. However, it incurs the overhead of reading and parsing XML -- and things like duplicate elimination are still not obvious.
So, you do have options. In almost all cases, the junction table is the right approach.
There is an "it depends" that you should consider. If the data is never going to be queried (or queried very rarely) storing it as XML or JSON would be perfectly acceptable. Many DBAs would freak out but it is much faster to get the blob of data that you are going to send to the client than to recompose and decompose a set of columns from a secondary table. (There is a reason document and object databases are becoming so popular.)
... though I would ask why are you replicating active directory to your database and how are you planning on keeping these in sync.
I not really a bad idea to store multiple values in one column, but will depend the search you want.
If you just only want to know the persons that is part of a group then you can store persons in one column with a group id as key. For update you just update the entire list in a group.
But if you want to search a specified person that belongs to group, then its not recommended that you store this multiple persons in one column. In this case its better to store a itermedium table that store person id, and group id.
Sounds like you want a table that maps users to group IDs and a second table that maps group IDs to which computers are in that group. I'm not sure, your language describing the problem was a bit confusing to me.
a list has some columns like: name, family name, phone number etc.
and rows like name=john familyName= lee number=12321321
name=... familyname=... number=...
an sql database works same way. every row in a sql database is a record. so you jusr add records of your list into your database using insert query.
complete explanation in here:
http://www.w3schools.com/sql/sql_insert.asp
This sounds like a typical many-to-many problem. You have many groups and many computers and they are related to eachother. In this situation, it is often recommended to use a mapping table, a.k.a. "junction table" or "cross-reference" table. This table consist solely of the two foreign keys in your other tables.
If your tables look like this:
Computer
- computerId
- otherComputerColumns
Group
- groupId
- othergroupColumns
Then your mapping table would look like this:
GroupComputer
- groupId
- computerId
And you would insert a single record for every relationship between a group and computer. This is in compliance with the rules for third normal form in regards to database normalization.
You can have a table with the group and group id, another table with the computer and computer id and a third table with the relation of group id and computer id.

Is it better to have more or less tables in sql?

I have a large database, one of the tables is called Users.
There are two kinds of users in the database - basic users, and advanced users. In the users table, there are 29 columns. However, only 12 of these are applicable to the basic users - the other 17 columns are only used for advanced users, and for basic users they all just contain a value of null.
Is this an OK setup? Would it be more efficient to, say, split the two kinds of users into two different tables, or put all the extra fields that advanced users have in a separate table?
It's better to have the right amount of tables - this may be more or less, depending on your needs.
To your specific case, you should always start with third normal form and only revert to lesser forms when absolutely necessary (such as for performance) and only when you understand the consequences.
An attribute (column) belongs in a table if it is dependent on the key, the whole key and nothing but the key (so help me, Codd).
It's arguable whether your other 17 columns depend on the key in your user table but I would be separating them anyway just for the space saving.
Have your basic user table with the twelve columns (including a unique key of some sort) and your advanced user table with the other columns, and also that key so you can tie the rows from each together.
You could go even further and have a one to many relationship if your use case is that users can have any of the 17 attributes independent of each other but that doesn't seem to be what you've described.
It depends:
If the number of columns is large, then it will be more efficient to create two tables as you describe as you will not be reserving space for 17 columns which end up holding null.
You can always tack a view on the front which combines both tables, so your application code could be unaffected.
Yes its better to split up this table but not in two Its better to split in three table
User Table-
Contain common property of both user and Adavace user
UserID(PK)
UserName
Basic user -
Contains basic user property and have use primary key of user table and foreign key
USerID(FK) - from user table
BasicUsedetail
Advance user-
Contains Advance user property and have use primary key of user table and foreign key
USerID(FK) - from user table
AdvanceUsedetail
In this case, it's valid and more efficient to use 'single table per class hierarchy' in terms of speed to retrieve data but if you insert a BasicUser, it will reserve 17 columns per tuple just for nothing. This case is is so frequent that it is provided by ORMs such as Hibernate. Using this approach you avoid a join between tables which may be expensive depending the case.
The bad thing is that in case your design needs to scale in terms of types of users, you will need to add additional columns which many of them will be empty.
Usually it won't matter much, but if you got many many users and only a few of them are advanced user, it might be better to split. To my knowledge there are not exact rules of when to split and when not.

Should I include user_id in multiple tables?

I'm at the planning stages of a multi-user application where each user will only have access their own data. There'll be a few tables that relate to each other, so I could use JOINs to ensure they're accessing only their data, but should I include user_id in each table? Would this be faster? It would certainly make some of the queries easier in the long run.
Specifically, the question is about multiple tables containing the user_id field.
For example, each user can configure categories, items (in those categories), and sub-items against those items. There's a logical path from user, to sub-items through the other tables, but it would require 3 JOINs. Should I just include user_id in all the tables?
Thanks!
This is a design decision in multi-tenant databases. With "root" tables, obviously you have to have the user_id. But in the non-"root" tables, you do have a choice when you are using surrogate PKs.
Say you have users with projects and projects with actions. Projects obviously has to have a user_id, but if actions are tied to one and only one project, then the user_id is redundant, and also violates normal form, since if it was to move to another user's project (probably not likely in your use cases), both the project FK and the user FK would have to be updated. Typically in multi-tenant scenarios, this isn't really a possible scenario, and so the primary key of every table is really a combination of tenant and a unique primary key "within" the tenant (which may also happen to be globally unique).
If you use natural keys extensively in your design, then clearly tenant+natural key is necessary so that each tenant's natural keys can be used. It's only when using surrogates like IDENTITY or GUIDs or sequences, that this becomes an issue, since it is tempting to make the IDENTITY the PK, after all, it is unique by definition.
Having the user_id in all tables does allow you to do certain things in views to enhance security (defense in depth), giving you a little bit of defensive programming (in SQL Server you can restrict all access through inline table valued function - essentially parametrized views - which require the app to specify user_id on every "table" access), and also allows you to easily scale out to multiple databases by forklifting everything on shared keys.
See this article for some interesting insights.
(In a massively multi-parallel paradigm like Teradata, the PRIMARY INDEX determines the amp on which the data lives, so I would think that this is a must to stop redistribution of rows to the other amps.)
In general, I would say you have a tenantid in each table, it should be the first column in the table, in most indexes and should be part of the primary key in most cases, unless otherwise justified. Where possible, it should be a required parameter in most stored procedures.
Generally, you use foreign keys to relate data between tables. In many cases, this foreign key is the user id. For example:
users
id
name
phonenumbers
user_id
phonenumber
So yes, that'd make perfect sense.
If a category can only belong to one user then yes, you need to include the user_id in the category table. If a category can belong to multiple people then you would have a separate table that maps category IDs to user IDs. You can still do this if you have a one to one mapping between the two, but there is no real reason for it.
You don't need to include the user_id in further tables if you can guarantee that those child tables will always be accessed via joining to the category table. If there is a chance that you will access them independantly of the category table then you should also have the user_id on those tables.
The extent to which to normalize can be a difficult decision. One of the best StackOverflow answers on this topic (Database Development Mistakes Made by App Developers) warns against both (1) failing to normalize, and (2) over-normalizing.
You mention that it might be easier "in the long run" to repeat the same data in multiple tables (that is, not to normalize that data). Look at the "Not simplifying complex queries through views" topic in the previous link. If you use views effectively, you will only have to do the 3 join query once when writing the view and then you can use a query with no joins for most purposes.
Most developers tend to under-normalize because it seems simpler. Go ahead and normalize. Use views to simplify your daily queries. When your requiremens get more complex or you decide to add features, you will be glad that you put time into a relational database design.
Alternatively, depending on your toolset, you may want to use a database abstraction layer that does the relational design under the covers while you manipulate higher level data object.
if it is Oracle, then you would probably set up a fine grained security rule to do the joins and prevent certain activities based on the existence of the original user id... (SELECT INSERT UPDATE DELETE etc)
You would need a map between the logged in user and the user_id. You could use uid, but then remember this umber may change if the database is reconstructed after some disaster...

Database Design for One to One relationships

I'm trying to finalize my design of the data model for my project, and am having difficulty figuring out which way to go with it.
I have a table of users, and an undetermined number of attributes that apply to that user. The attributes are in almost every case optional, so null values are allowed. Each of these attributes are one to one for the user. Should I put them on the same table, and keep adding columns when attributes are added (making the user table quite wide), or should I put each attribute on a separate table with a foreign key to the user table.
I have decided against using the EAV model.
Thanks!
Edit
Properties include thing like marital status, gender, age, first and last name, occupation, etc. All are optional.
Tables:
USERS
USER_PREFERENCE_TYPE_CODES
USER_PREFERENCES
USER_PREFERENCES is a many-to-many table, connecting the USERS and USER_PREFERENCE_TYPE_CODES tables. This will allow you to normalize the preference type attribute, while still being flexible to add preferences without needing an ALTER TABLE statement.
Could you give some examples of what kind of properties you'd want to add to the user table? As long as you stay below roughly 50 columns, it shouldn't be a big deal.
How ever, one way would be to split the data:
One table (users) for username, hashed_password, last_login, last_ip, current_ip etc, another table (profiles) for display_name, birth_day etc.
You'd link them either via the same id property or you'd add an user_id column to the other tables.
It depends.
You need to Look at what percentage of users will have that attribute. If the attribute is 'WalkedOnTheMoon' then split it out, if it is 'Sex' include it on the user's table. Also consider the number of columns on the base table, a few, 10-20, won't hurt that much.
If you have several related attributes you could group them into a common table: 'MedicalSchoolId', 'MedicalSpeciality', 'ResidencyHospitalId', etc. could be combined in UserMedical table.
Personally I would decide on whether there are natural groupings of attributes. You might put the most commonly queried in the user table and the others in a separate table with a one-to-one relationship to keep the table from being too wide (we usually call that something like User_Extended). If some of the attributes fall into natural groupings, they may call for a separate table because those attributes will usually be queried together.
In looking at the attributes, examine if some can be combined into one column (for instance if a user cannot simlutaneoulsy be three differnt things (say intern, resident, attending) but only one of them at a time, it is better to have one field and put the data into it rather than three bit fields that have to be transalted. This is especially true if you will need to use a case statement with all three fileds to get the information (say title) that you want in reporting. IN other words look over your attributes and see if they are truly separate or if they can be abstracted into a more general one.