Sphinx question: Structuring database - sql

I'm developing a job service that has features like radial search, full-text search, the ability to do full-text search + disable certain job listings (such as un-checking a textbox and no longer returning full-time jobs).
The developer who is working on Sphinx wants the database information to all be stored as intergers with a key (so under the table "Job Type" values might be stored such as 1="part-time" and 2="full-time")... whereas the other developers want to keep the database as strings (so under the table "Job Type" it says "part-time" or "full-time".
Is there a reason to keep the database as ints? Or should strings be fine?
Thanks!
Walker

Choosing your key can have a dramatic performance impact. Whenever possible, use ints instead of strings. This is called using a "surrogate key", where the key presents a unique and quick way to find the data, rather than the data standing on it's own.
String comparisons are resource intensive, potentially orders of magnitude worse than comparing numbers.
You can drive your UI off off the surrogate key, but show another column (such as job_type). This way, when you hit the database you pass the int in, and avoid looking through to the table to find a row with a matching string.
When it comes to joining tables in the database, they will run much faster if you have int's or another number as your primary keys.
Edit: In the specific case you have mentioned, if you only have two options for what your field may be, and it's unlikely to change, you may want to look into something like a bit field, and you could name it IsFullTime. A bit or boolean field holds a 1 or a 0, and nothing else, and typically isn't related to another field.

if you are normalizing your structure (i hope you are) then numeric keys will be most efficient.

Aside from the usual reasons to use integer primary keys, the use of integers with Sphinx is essential, as the result set returned by a successful Sphinx search is a list of document IDs associated with the matched items. These IDs are then used to extract the relevant data from the database. Sphinx does not return rows from the database directly.
For more details, see the Sphinx manual, especially 3.5. Restrictions on the source data.

Related

SQL many value in one var [duplicate]

So, per Mehrdad's answer to a related question, I get it that a "proper" database table column doesn't store a list. Rather, you should create another table that effectively holds the elements of said list and then link to it directly or through a junction table. However, the type of list I want to create will be composed of unique items (unlike the linked question's fruit example). Furthermore, the items in my list are explicitly sorted - which means that if I stored the elements in another table, I'd have to sort them every time I accessed them. Finally, the list is basically atomic in that any time I wish to access the list, I will want to access the entire list rather than just a piece of it - so it seems silly to have to issue a database query to gather together pieces of the list.
AKX's solution (linked above) is to serialize the list and store it in a binary column. But this also seems inconvenient because it means that I have to worry about serialization and deserialization.
Is there any better solution? If there is no better solution, then why? It seems that this problem should come up from time to time.
... just a little more info to let you know where I'm coming from. As soon as I had just begun understanding SQL and databases in general, I was turned on to LINQ to SQL, and so now I'm a little spoiled because I expect to deal with my programming object model without having to think about how the objects are queried or stored in the database.
Thanks All!
John
UPDATE: So in the first flurry of answers I'm getting, I see "you can go the CSV/XML route... but DON'T!". So now I'm looking for explanations of why. Point me to some good references.
Also, to give you a better idea of what I'm up to: In my database I have a Function table that will have a list of (x,y) pairs. (The table will also have other information that is of no consequence for our discussion.) I will never need to see part of the list of (x,y) pairs. Rather, I will take all of them and plot them on the screen. I will allow the user to drag the nodes around to change the values occasionally or add more values to the plot.
No, there is no "better" way to store a sequence of items in a single column. Relational databases are designed specifically to store one value per row/column combination. In order to store more than one value, you must serialize your list into a single value for storage, then deserialize it upon retrieval. There is no other way to do what you're talking about (because what you're talking about is a bad idea that should, in general, never be done).
I understand that you think it's silly to create another table to store that list, but this is exactly what relational databases do. You're fighting an uphill battle and violating one of the most basic principles of relational database design for no good reason. Since you state that you're just learning SQL, I would strongly advise you to avoid this idea and stick with the practices recommended to you by more seasoned SQL developers.
The principle you're violating is called first normal form, which is the first step in database normalization.
At the risk of oversimplifying things, database normalization is the process of defining your database based upon what the data is, so that you can write sensible, consistent queries against it and be able to maintain it easily. Normalization is designed to limit logical inconsistencies and corruption in your data, and there are a lot of levels to it. The Wikipedia article on database normalization is actually pretty good.
Basically, the first rule (or form) of normalization states that your table must represent a relation. This means that:
You must be able to differentiate one row from any other row (in other words, you table must have something that can serve as a primary key. This also means that no row should be duplicated.
Any ordering of the data must be defined by the data, not by the physical ordering of the rows (SQL is based upon the idea of a set, meaning that the only ordering you should rely on is that which you explicitly define in your query)
Every row/column intersection must contain one and only one value
The last point is obviously the salient point here. SQL is designed to store your sets for you, not to provide you with a "bucket" for you to store a set yourself. Yes, it's possible to do. No, the world won't end. You have, however, already crippled yourself in understanding SQL and the best practices that go along with it by immediately jumping into using an ORM. LINQ to SQL is fantastic, just like graphing calculators are. In the same vein, however, they should not be used as a substitute for knowing how the processes they employ actually work.
Your list may be entirely "atomic" now, and that may not change for this project. But you will, however, get into the habit of doing similar things in other projects, and you'll eventually (likely quickly) run into a scenario where you're now fitting your quick-n-easy list-in-a-column approach where it is wholly inappropriate. There is not much additional work in creating the correct table for what you're trying to store, and you won't be derided by other SQL developers when they see your database design. Besides, LINQ to SQL is going to see your relation and give you the proper object-oriented interface to your list automatically. Why would you give up the convenience offered to you by the ORM so that you can perform nonstandard and ill-advised database hackery?
You can just forget SQL all together and go with a "NoSQL" approach. RavenDB, MongoDB and CouchDB jump to mind as possible solutions. With a NoSQL approach, you are not using the relational model..you aren't even constrained to schemas.
What I have seen many people do is this (it may not be the best approach, correct me if I am wrong):
The table which I am using in the example is given below(the table includes nicknames that you have given to your specific girlfriends. Each girlfriend has a unique id):
nicknames(id,seq_no,names)
Suppose, you want to store many nicknames under an id. This is why we have included a seq_no field.
Now, fill these values to your table:
(1,1,'sweetheart'), (1,2,'pumpkin'), (2,1,'cutie'), (2,2,'cherry pie')
If you want to find all the names that you have given to your girl friend id 1 then you can use:
select names from nicknames where id = 1;
Simple answer: If, and only if, you're certain that the list will always be used as a list, then join the list together on your end with a character (such as '\0') that will not be used in the text ever, and store that. Then when you retrieve it, you can split by '\0'. There are of course other ways of going about this stuff, but those are dependent on your specific database vendor.
As an example, you can store JSON in a Postgres database. If your list is text, and you just want the list without further hassle, that's a reasonable compromise.
Others have ventured suggestions of serializing, but I don't really think that serializing is a good idea: Part of the neat thing about databases is that several programs written in different languages can talk to one another. And programs serialized using Java's format would not do all that well if a Lisp program wanted to load it.
If you want a good way to do this sort of thing there are usually array-or-similar types available. Postgres for instance, offers array as a type, and lets you store an array of text, if that's what you want, and there are similar tricks for MySql and MS SQL using JSON, and IBM's DB2 offer an array type as well (in their own helpful documentation). This would not be so common if there wasn't a need for this.
What you do lose by going that road is the notion of the list as a bunch of things in sequence. At least nominally, databases treat fields as single values. But if that's all you want, then you should go for it. It's a value judgement you have to make for yourself.
In addition to what everyone else has said, I would suggest you analyze your approach in longer terms than just now. It is currently the case that items are unique. It is currently the case that resorting the items would require a new list. It is almost required that the list are currently short. Even though I don't have the domain specifics, it is not much of a stretch to think those requirements could change. If you serialize your list, you are baking in an inflexibility that is not necessary in a more-normalized design. Btw, that does not necessarily mean a full Many:Many relationship. You could just have a single child table with a foreign key to the parent and a character column for the item.
If you still want to go down this road of serializing the list, you might consider storing the list in XML. Some databases such as SQL Server even have an XML data type. The only reason I'd suggest XML is that almost by definition, this list needs to be short. If the list is long, then serializing it in general is an awful approach. If you go the CSV route, you need to account for the values containing the delimiter which means you are compelled to use quoted identifiers. Persuming that the lists are short, it probably will not make much difference whether you use CSV or XML.
If you need to query on the list, then store it in a table.
If you always want the list, you could store it as a delimited list in a column. Even in this case, unless you have VERY specific reasons not to, store it in a lookup table.
Many SQL databases allow a table to contain a subtable as a component. The usual method is to allow the domain of one of the columns to be a table. This is in addition to using some convention like CSV to encode the substructure in ways unknown to the DBMS.
When Ed Codd was developing the relational model in 1969-1970, he specifically defined a normal form that would disallow this kind of nesting of tables. Normal form was later called First Normal Form. He then went on to show that for every database, there is a database in first normal form that expresses the same information.
Why bother with this? Well, databases in first normal form permit keyed access to all data. If you provide a table name, a key value into that table, and a column name, the database will contain at most one cell containing one item of data.
If you allow a cell to contain a list or a table or any other collection, now you can't provide keyed access to the sub items, without completely reworking the idea of a key.
Keyed access to all data is fundamental to the relational model. Without this concept, the model isn't relational. As to why the relational model is a good idea, and what might be the limitations of that good idea, you have to look at the 50 years worth of accumulated experience with the relational model.
I'd just store it as CSV, if it's simple values then it should be all you need (XML is very verbose and serializing to/from it would probably be overkill but that would be an option as well).
Here's a good answer for how to pull out CSVs with LINQ.
Only one option doesn't mentioned in the answers. You can de-normalize your DB design. So you need two tables. One table contains proper list, one item per row, another table contains whole list in one column (coma-separated, for example).
Here it is 'traditional' DB design:
List(ListID, ListName)
Item(ItemID,ItemName)
List_Item(ListID, ItemID, SortOrder)
Here it is de-normalized table:
Lists(ListID, ListContent)
The idea here - you maintain Lists table using triggers or application code. Every time you modify List_Item content, appropriate rows in Lists get updated automatically. If you mostly read lists it could work quite fine. Pros - you can read lists in one statement. Cons - updates take more time and efforts.
I was very reluctant to choose the path I finally decide to take because of many answers. While they add more understanding to what is SQL and its principles, I decided to become an outlaw. I was also hesitant to post my findings as for some it's more important to vent frustration to someone breaking the rules rather than understanding that there are very few universal truthes.
I have tested it extensively and, in my specific case, it was way more efficient than both using array type (generously offered by PostgreSQL) or querying another table.
Here is my answer:
I have successfully implemented a list into a single field in PostgreSQL, by making use of the fixed length of each item of the list. Let say each item is a color as an ARGB hex value, it means 8 char. So you can create your array of max 10 items by multiplying by the length of each item:
ALTER product ADD color varchar(80)
In case your list items length differ you can always fill the padding with \0
NB: Obviously this is not necessarily the best approach for hex number since a list of integers would consume less storage but this is just for the purpose of illustrating this idea of array by making use of a fixed length allocated to each item.
The reason why:
1/ Very convenient: retrieve item i at substring i*n, (i +1)*n.
2/ No overhead of cross tables queries.
3/ More efficient and cost-saving on the server side. The list is like a mini blob that the client will have to split.
While I respect people following rules, many explanations are very theoretical and often fail to acknowledge that, in some specific cases, especially when aiming for cost optimal with low-latency solutions, some minor tweaks are more than welcome.
"God forbid that it is violating some holy sacred principle of SQL": Adopting a more open-minded and pragmatic approach before reciting the rules is always the way to go. Else you might end up like a candid fanatic reciting the Three Laws of Robotics before being obliterated by Skynet
I don't pretend that this solution is a breakthrough, nor that it is ideal in term of readability and database flexibility, but it can certainly give you an edge when it comes to latency.
What I do is that if the List required to be stored is small then I would just convert it to a string then split it later when required.
example in python -
for y in b:
if text1 == "":
text1 = y
else:
text1 = text1 + f"~{y}"
then when I required it I just call it from the db and -
out = query.split('~')
print(out)
this will return a list, and a string will be stored in the db. But if you are storing a lot of data in the list then creating a table is the best option.
If you really wanted to store it in a column and have it queryable a lot of databases support XML now. If not querying you can store them as comma separated values and parse them out with a function when you need them separated. I agree with everyone else though if you are looking to use a relational database a big part of normalization is the separating of data like that. I am not saying that all data fits a relational database though. You could always look into other types of databases if a lot of your data doesn't fit the model.
I think in certain cases, you can create a FAKE "list" of items in the database, for example, the merchandise has a few pictures to show its details, you can concatenate all the IDs of pictures split by comma and store the string into the DB, then you just need to parse the string when you need it. I am working on a website now and I am planning to use this way.
you can store it as text that looks like a list and create a function that can return its data as an actual list. example:
database:
_____________________
| word | letters |
| me | '[m, e]' |
| you |'[y, o, u]' | note that the letters column is of type 'TEXT'
| for |'[f, o, r]' |
|___in___|_'[i, n]'___|
And the list compiler function (written in python, but it should be easily translatable to most other programming languages). TEXT represents the text loaded from the sql table. returns list of strings from string containing list. if you want it to return ints instead of strings, make mode equal to 'int'. Likewise with 'string', 'bool', or 'float'.
def string_to_list(string, mode):
items = []
item = ""
itemExpected = True
for char in string[1:]:
if itemExpected and char not in [']', ',', '[']:
item += char
elif char in [',', '[', ']']:
itemExpected = True
items.append(item)
item = ""
newItems = []
if mode == "int":
for i in items:
newItems.append(int(i))
elif mode == "float":
for i in items:
newItems.append(float(i))
elif mode == "boolean":
for i in items:
if i in ["true", "True"]:
newItems.append(True)
elif i in ["false", "False"]:
newItems.append(False)
else:
newItems.append(None)
elif mode == "string":
return items
else:
raise Exception("the 'mode'/second parameter of string_to_list() must be one of: 'int', 'string', 'bool', or 'float'")
return newItems
Also here is a list-to-string function in case you need it.
def list_to_string(lst):
string = "["
for i in lst:
string += str(i) + ","
if string[-1] == ',':
string = string[:-1] + "]"
else:
string += "]"
return string
Imagine your grandmother's box of recipes, all written on index cards. Each of those recipes is a list of ingredients, which are themselves ordered pairs of items and quantities. If you create a recipe database, you wouldn't need to create one table for the recipe names and a second table where each ingredient was a separate record. That sounds like what we're saying here. My apologies if I've misread anything.
From Microsoft's T-SQL Fundamentals:
Atomicity of attributes is subjective in the same way that the
definition of a set is subjective. As an example, should an employee
name in an Employees relation be expressed with one attribute
(fullname), two (firstname and lastname), or three (firstname,
middlename, and lastname)? The answer depends on the application. If
the application needs to manipulate the parts of the employee’s name
separately (such as for search purposes), it makes sense to break them
apart; otherwise, it doesn’t.
So, if you needed to manipulate your list of coordinates via SQL, you would need to split the elements of the list into separate records. But is you just wanted to store a list and retrieve it for use by some other software, then storing the list as a single value makes more sense.

How to design a table that only needs a column?

I am creating a database table that'll have a list of all Tags available in my application (just like SO's tags).
Currently, I don't have anything associated with each tag (and I'll probably never have), so my idea was to have something of the form
Tags (Tag(pk) : string)
Should this be the way to do it? Or should I instead do something like
Tags (tag_id(pk) : int, tag : string)
I guess looking up on the table in the 2nd case would be faster than in the first one, but that it also takes up more space?
Thanks
I'd go for the second option with the surrogate key.
It will mean the table takes up more space but will likely reduce space over all assuming that you have the tag information as a foreign key in other tables (e.g. a posts/tags table)
using an int rather than a string will make the lookups required to enforce the foreign key more efficient and mean that updates of tag titles don't need to affect multiple tables.
Indexes work better with integers than CHAR/VARCHAR, go with a dedicated integer primary key column. If you need tag names to be unique you can add a constraint, but it's probably not worth the hassle.
You should go for the second option. Firstly, you never know what the future holds. Secondly, you may later want multiple language support or other things that makes the string-as-the-primary-key have a strange feeling around it. Thirdly, I like the idea of using a standard procedure for a table definition, ie. that there always is a column 'id' or 'pk'. It separates business from technology.
Quite possibly you'll have a faster lookup with the index being an integer. Further, consider making your index clustered for even further speedup.
I wouldn't emphasize too much on the performance issue though. As soon as a program starts talking to a database over the internet, you have a much bigger delay than 99% of all the queries of your database (of course with the exception of reporting queries!).
Those two options achieve quite different things. In the first case you have unique tags and in the second you don't. You haven't said what use TAG_ID is in this model. Unless you put in TAG_ID for a good reason then I'd stick with the first design. It's smaller, appears to meet your requirements precisely and Tag seems like a more obvious choice for a key (on grounds of familiarity and simplicity).

Generally, are string (or varchar) fields used as join fields?

We have two tables. The first contains a name (varchar) field. The second contains a field that references the name field from the first table. This foreign key in the second table will be repeated for every row associated with that name. Is it generally discouraged to use a varchar/string field as a join between two tables? When is the best case where a string field can be used as a join field?
It's certainly possible to use a varchar as a key field (or simply something to join on). The main problems with it are based on what you normally store in a varchar field; mutable data. Strictly speaking, it's not advisable to have key fields change. A person's name, telephone number, even their SSN can all change. However, the employee with internal ID 3 will always be ID 3, even if there are two John Smiths.
Second, string comparison is dependent on a number of nit-picky details, such as culture, collation, whitespace translation, etc. that can break a join for no immediately-apparent reason. Say you use a tabspace character \t for a certain string you're joining on. Later, you change your software to replace \t with 3 spaces to reduce character escapes in your raw strings. You have now broken any functionality requiring a string with escaped tabs to be matched to an identical-looking, but differently-composed, string.
Lastly, even given two perfectly identical strings, there is a slight performance benefit to comparing two integer numbers than comparing two strings. Integer comparison is effectively constant-time. String comparison is linear at best, based on the length of the string.
Is it generally discouraged to use a varchar/string field as a join between two tables?
If there's a natural key to be used (extremely rare in real life, but state/province abbreviations are a good example), then VARCHAR fields are fine.
When is the best case where a string field can be used as a join field?
Depends on the database because of the bits allocated to the data type, but generally VARCHAR(4) or less takes around the same amount of space (less the less number of characters) as INT would.
Generally speaking you shouldn't use anything that is editable by the end users as a FK as an edit would require not one update, but one update per table which references that key.
Everyone else has already mentioned the potenetial performance implications of a query, but the update cost is also worth noting. I strongly suggest the use of a generated key instead.
If you're concerned about performance, the best way to know is to create tables that implement your potential design choices, then load them up with massive amounts of data to see what happens.
In theory, very small strings should perform as well as a number in joins. In practice, it would definitely depend upon the database, indexing, and other implementation choices.
In a relational database, you shouldn't use a string in one table that references the same string in another table. If the second table is a look-up, create an identity column for the table, and then reference the integer value in the first. When displaying the data, use a join to the second table. Just make sure in the second table you never actually delete records.
The only exception would be if you are creating an archive table where you want to store exactly what was chosen at a given time.
Sometimes a join will happen on fields that are not "join fields", because that's just the nature of the query (e.g. most ways of identifying records that are duplicates in a particular column). If the query you want relates to those values, then that's what the join will be on, end of story.
If a field genuinely identifies a row, then it is possible to use it as a key. It's even possible to do so if it could change (it brings issues, but not insurmountable issues) as long as it remains a genuine identifier (it'll never change to a value that exists for another row).
The performance impact varies by common query and by database. By database the type of indexing strategies of some makes them better at using varchar and other textual keys than other databases (in particular, hash-indices are nice).
Common queries can be such that it becomes more performant to use varchar even without hash indices. A classic example is storing pieces of text for a multi-lingual website. Each such piece of text will have a particular languageID relating to the language it is in. However, obtaining other information about that language (it's name etc.) is rarely needed; what's much more often needed is to either filter by the RFC 5646 code, or to find out what that RFC 6546 code is. If we use a numeric id, then we will have to join for both types of query to obtain that code. If we use the code as the ID, then the most common queries concerned with the language won't need to look in the language table at all. Most queries that do care about the details of the language also won't need to do any join; pretty much the only time the key will be used as a foreign key is in maintaining referential integrity on update and insert of text or on deletion of languages. Hence while the join is less efficient when it is used the system as a whole will be more efficient by using fewer joins.
It depends on the nature of your data.
If the string is some user-entered and updated value then I would probably shy away from joining on it. You may run into consistency difficulties for storing the name in both the parent and the detail table.
Nothing has duplicate names?
I have used a string field as a join when using GUIDs or single char identifiers or when I know the string to be a natural key (though I almost always prefer a surrogate)
Natural primary keys like a zip code, phone number, email address or user name are by definition strings. There are unique and relatively short.
If you put an index on such a column there is no problem with using them a join. Impact on performance will usually be minimal.

MySQL Table with TEXT column

I've been working on a database and I have to deal with a TEXT field.
Now, I believe I've seen some place mentioning it would be best to isolate the TEXT column from the rest of the table(putting it in a table of its own).
However, now I can't find this reference anywhere and since it was quite a while ago, I'm starting to think that maybe I misinterpreted this information.
Some research revealed this, suggesting that
Separate text/blobs from metadata, don't put text/blobs in results if you don't need them.
However, I am not familiar with the definition of "metadata" being used here.
So I wonder if there are any relevant advantages in putting a TEXT column in a table of its own. What are the potential problems of having it with the rest of the fields? And potential problems of keeping it in a separated table?
This table(without the TEXT field) is supposed to be searched(SELECTed) rather frequently. Is "premature optimization considered evil" important here? (If there really is a penalty in TEXT columns, how relevant is it, considering it is fairly easy to change this later if needed).
Besides, are there any good links on this topic? (Perhaps stackoverflow questions&answers? I've tried to search this topic but I only found TEXT vs VARCHAR discussions)
Yep, it seems you've misinterpreted the meaning of the sentence. What it says is that you should only do a SELECT including a TEXT field if you really need the contents of that field. This is because TEXT/BLOB columns can contain huge amounts of data which would need to be delivered to your application - this takes time and of course resources.
Best wishes,
Fabian
This is probably premature optimisation. Performance tuning MySQL is really tricky and can only be done with real performance data for your application. I've seen plenty of attempts to second guess what makes MySQL slow without real data and the result each time has been a messy schema and complex code which will actually make performance tuning harder later on.
Start with a normalised simple schema, then when something proves too slow add a complexity only where/if needed.
As others have pointed out the quote you mentioned is more applicable to query results than the schema definition, in any case your choice of storage engine would affect the validity of the advice anyway.
If you do find yourself needing to add the complexity of moving TEXT/BLOB columns to a separate table, then it's probably worth considering the option of moving them out of the database altogether. Often file storage has advantages over database storage especially if you don't do any relational queries on the contents of the TEXT/BLOB column.
Basically, get some data before taking any MySQL tuning advice you get on the Internet, including this!
The data for a TEXT column is already stored separately. Whenever you SELECT * from a table with text column(s), each row in the result-set requires a lookup into the text storage area. This coupled with the very real possibility of huge amounts of data would be a big overhead to your system.
Moving the column to another table simply requires an additional lookup, one into the secondary table, and the normal one into the text storage area.
The only time that moving TEXT columns into another table will offer any benefit is if there it a tendency to usually select all columns from tables. This is merely introducing a second bad practice to compensate for the first. It should go without saying the two wrongs is not the same as three lefts.
The concern is that a large text field—like way over 8,192 bytes—will cause excessive paging and/or file i/o during complex queries on unindexed fields. In such cases, it's better to migrate the large field to another table and replace it with the new table's row id or index (which would then be metadata since it doesn't actually contain data).
The disadvantages are:
a) More complicated schema
b) If the large field is using inspected or retrieved, there is no advantage
c) Ensuring data consistency is more complicated and a potential source of database malaise.
There might be some good reasons to separate a text field out of your table definition. For instance, if you are using an ORM that loads the complete record no matter what, you might want to create a properties table to hold the text field so it doesn't load all the time. However if you are controlling the code 100%, for simplicity, leave the field on the table, then only select it when you need it to cut down on data trasfer and reading time.
Now, I believe I've seen some place mentioning it would be best to isolate the TEXT column from the rest of the table(putting it in a table of its own).
However, now I can't find this reference anywhere and since it was quite a while ago, I'm starting to think that maybe I misinterpreted this information.
You probably saw this, from the MySQL manual
http://dev.mysql.com/doc/refman/5.5/en/optimize-character.html
If a table contains string columns such as name and address, but many queries do not retrieve those columns, consider splitting the string columns into a separate table and using join queries with a foreign key when necessary. When MySQL retrieves any value from a row, it reads a data block containing all the columns of that row (and possibly other adjacent rows). Keeping each row small, with only the most frequently used columns, allows more rows to fit in each data block. Such compact tables reduce disk I/O and memory usage for common queries.
Which indeed is telling you that in MySQL you are discouraged from keeping TEXT data (and BLOB, as written elsewhere) in tables frequently searched

How does row design influence MySQL performance?

I got an users table and some forum, where users can write. Every action on forum uses users table. User can have a profile, which can be quite big (50KB). If I got such big data in each row wouldn't it be faster to have separate table with user's profiles and other data that aren't accessed very often?
In an online RPG game each character have a long list of abilities, for example: pistols experience, machine guns experience, throwing grenades experience, and 15 more. Is it better to store them in a string as numbers separated with semicolon - which would take more space than integers, or should I make for each ability individual field? Or maybe binary? (I use c++)
If you don't need the data from
specific columns, don't get it.
Don't do SELECT * but SELECT a,
b,...
If you need to do SQL-queries over
certain columns e.g. ORDER BY
pistols_experience, you should
leave it in different columns. If
you just display it all at once, you
could serialize the different
key-value-pairs into a text field
via YAML, JSON etc.
(1) Not in itself, no. As stefan says, you should be selecting only what you want, so having stuff you don't want in the table is no issue. A 50K TEXT blob is only a pointer in the row.
However, there can be an issue if you are using MyISAM tables. In MyISAM there is only table-level locking, so when you have one user update their row (eg. last visit time), it blocks all other users from accessing the table. In this case you might experience some improvement by breaking out heavily-updated columns into a separate table from the relatively static but heavily-selected ones.
But you don't want to be using MyISAM anyway: it's a bit crap. Use InnoDB, get row-level locking (and transactions, and foreign key constraints), and don't worry about it. The only reason to use MyISAM tables today is for fulltext search, which InnoDB doesn't support.
(2) You would normally separate every independent value into its own field. If you hit a real performance issue and you don't need to do database-level manipulation of the values on their own, you could consider denormalising it, but you'd be losing the power of the database.