Data separated by commas inside a field vs new table [duplicate] - sql

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is storing a comma separated list in a database column really that bad?
This is a problem I often encounter when trying to expand the database.
For example:
I want to keep track of how many users saw a particular article in my website, so in the database I added a views field to the article table. Now, if I wanted to make sure that these are unique views then this is clearly not enough.
So let's say that in my design I can identify a user (or at least a computer) with a single number that's stored along with an IP or something.
Then, if I wanted to keep track of how many unique users saw a particular article which is the best way to go?
To create another table article_views with the fields article_id and
user_id.
To save the user_ids separated by commas inside the views field of the article table.

Never, ever, ever choose the separate-by-commas solution. It is a violation of every principle of database design. Create a separate table instead.
In your particular case, create the table with the PRIMARY KEY on (article_id, user_id). The database will then prohibit the entry of duplicate records. Depending on your SQL engine, you can additionally use INSERT OR IGNORE (or equivalent) to avoid throwing exceptions.
The other solution requires you to enforce the uniqueness in the all applications that touch the data.

Don't use comma separated values. Ever. Create a separate table that links all of those Ids to the article viewed.
Pretty simple design. It will be a table with two columns, both foreign keys. One to the article table and the other to the user table.

Have you considered using
SELECT COUNT(DISTINCT User + IP) As UniqueViews FROM Views GROUP BY ArticleID
If your views table contains duplicates like this records against User and IP (i.e. 10 per cilck per day or something), then COUTN(DISTINCT) will count the distinct occurences of them, not the number of records.

Related

advantages and disadvantages of database automatic number generator for each row vs manual numbering for each row

Imagine two tables that implemented like the following description:
The first table rows numbers created by database system administration automatically.
The second table rows numbers created manually by the programmer in a sequential order.
The main question is what are the advantages and disadvantages of these two approaches?
One distinct advantage of having the database manage auto-numbering over manually creating them is that the database implementation is thread safe - and manually creating them is usually (99.9% of the cases) is not (It's hard to do it correctly).
On the other hand, the database implementation does not guarantee sequential numbering - there can be gaps in the numbers.
Given these two facts, an auto-increment column should be used only as a surrogate key, when the values of this column does not have any business meaning - but they are simple used as a simple row identifier.
Please note that when using a surrogate key, it's best to also enforce uniqueness of a natural key - otherwise you might get rows where all the data is duplicated except the surrogate key.
When the database automatically create numbers, you habe less work.
Think about a sign up system, you have fields like name, email, password and so one:
1.) the number is generated by the database, so you can just insert the data into the table.
2.) if this is not the case you have to get the last number, so before the insert into you have to get the last id so instead a insert into you have a select + insert into.
Another reason is, what happened when you delete a row in your table?
Maybe in a forum, you want to delete the account but not all of his posts, so you can work with a workaround and when a post has a user_id not given you know this is/was a deleted or banned account - if you give a new user the number from a deleted user you will come in trouble.

What is the most correct way to store a "list" in a SQL Database?

So, I've read a lot about how stashing multiple values into one column is a bad idea and violates the first rule of data normalisation (which, surprisingly, is not "Do Not Talk About Data Normalisation") so I need some help.
At the moment I'm designing an ASP .NET webpage for the place I work for. I want to display data on a web page depending on what Active Directory groups the person belongs to. The first way of doing this that comes to mind is to have a table with, essentially, a column containing the AD group and the second column containing what list of computers belong to that list.
I've learnt that this is showing great disregard for relational databases, so what is a better way to do it? I want to control this access by SQL tables, so I can add/remove from these tables and change end users access accordingly.
Thanks for the help! :)
EDIT: To describe exactly what I want to do is this:
We have a certain group of computers that need to be checked up on, however these computers are in physically difficult to reach locations. The organisation I belong to has remote control enabled for these computers, however they're not in the business of giving out the remote control password (understandable).
The added layer of complexity is that, depending on who you are, our clients should only be able to see a certain group of computers (that is, the group of computers that their area owns). So, if Group A has Thomas in it, and Group B has Jones in it, if you belong to either group then you would just see one entry. However, if you belong to both groups you should see both Thomas and Jones computers in it.
The reason why I think that storing this data in a SQL cell is the way to go is because, to store them in tables would require (in my mind) a new table for each new "group" of computers. I don't want to crank out SQL tables for every new group, I'd much rather just have an added row in a SQL table somewhere.
Does this make any sense?
You basically have three options in SQL Server:
Storing the values in a single column.
Storing the values in a junction table.
Storing the values as XML (or as some other structured data format).
(Other databases have other options, such as arrays, nested tables, and JSON.)
In almost all cases, using a junction table is the correct approach. Why? Here are some reasons:
SQL Server has (relatively) lousy string manipulation, so doing something as simple as ensuring a unique list is really, really hard.
A junction table allows you to store lots of other information (When was a machine added? What is the full description of the machine? etc. etc.).
Most queries that you want are pretty easy with a junction table (with the one exception of getting a comma-delimited list, alas -- which is just counterintuitive rather than "hard").
All the types are stored natively.
A junction table allows you to enforce constraints (both check and foreign key) on the elements of the list.
Although a delimited list is almost never the right solution, it is possible to think of cases where it might be useful:
The list doesn't change and presentation of the list is very important.
Space usage is an issue (alas, denormalization often results in fewer pages).
Queries do not really access elements of the list, just the entire thing.
XML is also a reasonable choice under some circumstances. In the most recent versions of SQL Server, this can be made pretty efficient. However, it incurs the overhead of reading and parsing XML -- and things like duplicate elimination are still not obvious.
So, you do have options. In almost all cases, the junction table is the right approach.
There is an "it depends" that you should consider. If the data is never going to be queried (or queried very rarely) storing it as XML or JSON would be perfectly acceptable. Many DBAs would freak out but it is much faster to get the blob of data that you are going to send to the client than to recompose and decompose a set of columns from a secondary table. (There is a reason document and object databases are becoming so popular.)
... though I would ask why are you replicating active directory to your database and how are you planning on keeping these in sync.
I not really a bad idea to store multiple values in one column, but will depend the search you want.
If you just only want to know the persons that is part of a group then you can store persons in one column with a group id as key. For update you just update the entire list in a group.
But if you want to search a specified person that belongs to group, then its not recommended that you store this multiple persons in one column. In this case its better to store a itermedium table that store person id, and group id.
Sounds like you want a table that maps users to group IDs and a second table that maps group IDs to which computers are in that group. I'm not sure, your language describing the problem was a bit confusing to me.
a list has some columns like: name, family name, phone number etc.
and rows like name=john familyName= lee number=12321321
name=... familyname=... number=...
an sql database works same way. every row in a sql database is a record. so you jusr add records of your list into your database using insert query.
complete explanation in here:
http://www.w3schools.com/sql/sql_insert.asp
This sounds like a typical many-to-many problem. You have many groups and many computers and they are related to eachother. In this situation, it is often recommended to use a mapping table, a.k.a. "junction table" or "cross-reference" table. This table consist solely of the two foreign keys in your other tables.
If your tables look like this:
Computer
- computerId
- otherComputerColumns
Group
- groupId
- othergroupColumns
Then your mapping table would look like this:
GroupComputer
- groupId
- computerId
And you would insert a single record for every relationship between a group and computer. This is in compliance with the rules for third normal form in regards to database normalization.
You can have a table with the group and group id, another table with the computer and computer id and a third table with the relation of group id and computer id.

Is it better to have more or less tables in sql?

I have a large database, one of the tables is called Users.
There are two kinds of users in the database - basic users, and advanced users. In the users table, there are 29 columns. However, only 12 of these are applicable to the basic users - the other 17 columns are only used for advanced users, and for basic users they all just contain a value of null.
Is this an OK setup? Would it be more efficient to, say, split the two kinds of users into two different tables, or put all the extra fields that advanced users have in a separate table?
It's better to have the right amount of tables - this may be more or less, depending on your needs.
To your specific case, you should always start with third normal form and only revert to lesser forms when absolutely necessary (such as for performance) and only when you understand the consequences.
An attribute (column) belongs in a table if it is dependent on the key, the whole key and nothing but the key (so help me, Codd).
It's arguable whether your other 17 columns depend on the key in your user table but I would be separating them anyway just for the space saving.
Have your basic user table with the twelve columns (including a unique key of some sort) and your advanced user table with the other columns, and also that key so you can tie the rows from each together.
You could go even further and have a one to many relationship if your use case is that users can have any of the 17 attributes independent of each other but that doesn't seem to be what you've described.
It depends:
If the number of columns is large, then it will be more efficient to create two tables as you describe as you will not be reserving space for 17 columns which end up holding null.
You can always tack a view on the front which combines both tables, so your application code could be unaffected.
Yes its better to split up this table but not in two Its better to split in three table
User Table-
Contain common property of both user and Adavace user
UserID(PK)
UserName
Basic user -
Contains basic user property and have use primary key of user table and foreign key
USerID(FK) - from user table
BasicUsedetail
Advance user-
Contains Advance user property and have use primary key of user table and foreign key
USerID(FK) - from user table
AdvanceUsedetail
In this case, it's valid and more efficient to use 'single table per class hierarchy' in terms of speed to retrieve data but if you insert a BasicUser, it will reserve 17 columns per tuple just for nothing. This case is is so frequent that it is provided by ORMs such as Hibernate. Using this approach you avoid a join between tables which may be expensive depending the case.
The bad thing is that in case your design needs to scale in terms of types of users, you will need to add additional columns which many of them will be empty.
Usually it won't matter much, but if you got many many users and only a few of them are advanced user, it might be better to split. To my knowledge there are not exact rules of when to split and when not.

Is a one column table good design? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
It it ok to have a table with just one column? I know it isn't technically illegal, but is it considered poor design?
EDIT:
Here are a few examples:
You have a table with the 50 valid US state codes, but you have no need to store the verbose state names.
An email blacklist.
Someone mentioned adding a key field. The way I see it, this single column WOULD be the primary key.
In terms of relational algebra this would be a unary relation, meaning "this thing exists"
Yes, it's fine to have a table defining such a relation: for instance, to define a domain.
The values of such a table should be natural primary keys of course.
A lookup table of prime numbers is what comes to my mind first.
Yes, it's certainly good design to design a table in such a way as to make it most efficient. "Bad RDBMS Design" is usually centered around inefficiency.
However, I have found that most cases of single column design could benefit from an additional column. For example, State Codes can typically have the Full State name spelled out in a second column. Or a blacklist can have notes associated. But, if your design really does not need that information, then it's perfectly ok to have the single column.
I've used them in the past. One client of mine wanted to auto block anyone trying to sign up with a phone number in this big list he had so it was just one big blacklist.
If there is a valid need for it, then I don't see a problem. Maybe you just want a list of possibilities to display for some reason and you want to be able to dynamically change it, but have no need to link it to another table.
One case that I found sometimes is something like this:
Table countries_id, contains only one column with numeric ID for each country.
Table countries_description, contains the column with country ID, a column With language ID and a column with the localized country name.
Table company_factories, contains information for each factory of the company, including the country in Wich is located.
So to maintain data coherence and language independent data in the tables the database uses this schema with tables with only one column to allow foreign keys without language dependencies.
In this case I think the existence of one column tables are justified.
Edited in response to the comment by: Quassnoi
(source: ggpht.com)
In this schema I can define a foreign key in the table company_factories that does not require me to include Language column on the table, but if I don't have the table countries_id, I must include Language column on the table to define the foreign key.
There would be rare cases where a single-column table makes sense. I did one database where the list of valid language codes was a single-column table used as a foreign key. There was no point in having a different key, since the code itself was the key. And there was no fixed description since the language code descriptions would vary by language for some contexts.
In general, any case where you need an authoritative list of values that do not have any additional attributes is a good candidate for a one-column table.
I use single-column tables all the time -- depending, of course, on whether the app design already uses a database. Once I've endured the design overhead of establishing a database connection, I put all mutable data into tables where possible.
I can think of two uses of single-column tables OTMH:
1) Data item exists. Often used in dropdown lists. Also used for simple legitimacy tests.
Eg. two-letter U.S. state abbreviations; Zip codes that we ship to; words legal in Scrabble; etc.
2) Sparse binary attribute, ie., in a large table, a binary attribute that will be true for only a very few records. Instead of adding a new boolean column, I might create a separate table containing the keys of the records for which the attribute is true.
Eg. employees that have a terminal disease; banks with a 360-day year (most use 365); etc.
-Al.
Mostly I've seen this in lookup type tables such as the state table you described. However, if you do this be sure to set the column as the primary key to force uniqueness. If you can't set this value as unique, then you shouldn't be using one column.
No problem as long as it contains unique values.
I would say in general, yes. Not sure why you need just one column. There are some exceptions to this that I have seen used effectively. It depends on what you're trying to achieve.
They are not really good design when you're thinking of the schema of the database, but really should only be used as utility tables.
I've seen numbers tables used effectively in the past.
The purpose of a database is to relate pieces of information to each other. How can you do that when there is no data to relate to?
Maybe this is some kind of compilation table (i.e. FirstName + LastName + Birthdate), though I'm still not sure why you would want to do that.
EDIT: I could see using this kind of table for a simple list of some kind. Is that what you are using it for?
Yes as long as the field is the primary key as you said it would be. The reason is because if you insert duplicate data those rows will be readonly. If you try to delete one of the rows that are duplicated. it will not work because the server will not know which row to delete.
The only use case I can conceive of is a table of words perhaps for a word game. You access the table just to verify that a string is a word: select word from words where word = ?. But there are far better data structures for holding a list of words than a relational database.
Otherwise, data in a database is usually placed in a database to take advantage of the relationships between various attributes of the data. If your data has no attributes beyond its value how will these relationship be developed?
So, while not illegal, in general you probably should not have a table with just one column.
All my tables have at least four tech fields, serial primary key, creation and modification timestamps, and soft delete boolean. In any blacklist, you will also want to know who did add the entry. So for me, answer is no, a table with only one column would not make sense except when prototyping something.
Yes that is perfectly fine. but an ID field couldn't hurt it right?

What is the best way to add users to multiple groups in a database?

In an application where users can belong to multiple groups, I'm currently storing their groups in a column called groups as a binary. Every four bytes is a 32 bit integer which is the GroupID. However, this means that to enumerate all the users in a group I have to programatically select all users, and manually find out if they contain that group.
Another method was to use a unicode string, where each character is the integer denoting a group, and this makes searching easy, but is a bit of a fudge.
Another method is to create a separate table, linking users to groups. One column called UserID and another called GroupID.
Which of these ways would be the best to do it? Or is there a better way?
You have a many-to-many relationship between users and groups. This calls for a separate table to combine users with groups:
User: (UserId[PrimaryKey], UserName etc.)
Group: (GroupId[PrimaryKey], GroupName etc.)
UserInGroup: (UserId[ForeignKey], GroupId[ForeignKey])
To find all users in a given group, you just say:
select * from User join UserInGroup on UserId Where GroupId=<the GroupId you want>
Rule of thumb: If you feel like you need to encode multiple values in the same field, you probably need a foreign key to a separate table. Your tricks with byte-blocks or Unicode chars are just clever tricks to encode multiple values in one field. Database design should not use clever tricks - save that for application code ;-)
I'd definitely go for the separate table - certainly the best relational view of data. If you have indexes on both UserID and GroupID you have a quick way of getting users per group and groups per user.
The more standard, usable and comprehensible way is the join table. It's easily supported by many ORMs, in addition to being reasonably performant for most cases. Only enter in "clever" ways if you have a reason to, say a million of users and having to answer that question every half a second.
I would make 3 tables. users, groups and usersgroups which is used as cross-reference table to link users and groups. In usersgroups table I would add userId and groupId columns and make them as primary key. BTW. What naming conventions there are to name those xref tables?
It depends what you're trying to do, but if your database supports it, you might consider using roles. The advantage of this is that the database provides security around roles, and you don't have to create any tables.