What are best practices for multi-language database design? [closed] - sql

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What is the best way to create multi-language database? To create localized table for every table is making design and querying complex, in other case to add column for each language is simple but not dynamic, please help me to understand what is the best choose for enterprise applications

What we do, is to create two tables for each multilingual object.
E.g. the first table contains only language-neutral data (primary key, etc.) and the second table contains one record per language, containing the localized data plus the ISO code of the language.
In some cases we add a DefaultLanguage field, so that we can fall-back to that language if no localized data is available for a specified language.
Example:
Table "Product":
----------------
ID : int
<any other language-neutral fields>
Table "ProductTranslations"
---------------------------
ID : int (foreign key referencing the Product)
Language : varchar (e.g. "en-US", "de-CH")
IsDefault : bit
ProductDescription : nvarchar
<any other localized data>
With this approach, you can handle as many languages as needed (without having to add additional fields for each new language).
Update (2014-12-14): please have a look at this answer, for some additional information about the implementation used to load multilingual data into an application.

I recommend the answer posted by Martin.
But you seem to be concerned about your queries getting too complex:
To create localized table for every table is making design and querying complex...
So you might be thinking, that instead of writing simple queries like this:
SELECT price, name, description FROM Products WHERE price < 100
...you would need to start writing queries like that:
SELECT
p.price, pt.name, pt.description
FROM
Products p JOIN ProductTranslations pt
ON (p.id = pt.id AND pt.lang = "en")
WHERE
price < 100
Not a very pretty perspective.
But instead of doing it manually you should develop your own database access class, that pre-parses the SQL that contains your special localization markup and converts it to the actual SQL you will need to send to the database.
Using that system might look something like this:
db.setLocale("en");
db.query("SELECT p.price, _(p.name), _(p.description)
FROM _(Products p) WHERE price < 100");
And I'm sure you can do even better that that.
The key is to have your tables and fields named in uniform way.

I find this type of approach works for me:
Product ProductDetail Country
========= ================== =========
ProductId ProductDetailId CountryId
- etc - ProductId CountryName
CountryId Language
ProductName - etc -
ProductDescription
- etc -
The ProductDetail table holds all the translations (for product name, description etc..) in the languages you want to support. Depending on your app's requirements, you may wish to break the Country table down to use regional languages too.

I'm using next approach:
Product
ProductID OrderID,...
ProductInfo
ProductID Title Name LanguageID
Language
LanguageID Name Culture,....

Martin's solution is very similar to mine, however how would you handle a default descriptions when the desired translation isn't found ?
Would that require an IFNULL() and another SELECT statement for each field ?
The default translation would be stored in the same table, where a flag like "isDefault" indicates wether that description is the default description in case none has been found for the current language.

Related

SQL Server - Q&A Engine

I'm investigating the possibility of writing a web-based question and answers engine to assist my girlfriend in her line of work.
What I mean by "question and answers" engine is, that the database (and ultimately user interface) can present many styles of question to an end user - not specifically multiple choice, the questions could be "fill in the blank(s)" or "sqrt(189) equals" or "change the words in the paragraph" or a "word-search" style question with many correct answers.
As a more specific example, if a question was "fill in the missing words: 'the ____ brown ____ jumps over the ____ dog'" and the user is presented with various answers. Likewise, another specific answer could be "list as many words you can make from the word: 'antidisestablishmentarianism'".
Having done some searches already on stackoverflow, there have been some similar questions raised however none seemed to fit the mould of what I'm wanting.
As I see it, I can have a single questions table...
TABLE Questions
QuestionID UNIQUE INT
Question NTEXT // The question text (could include markup for web)
Options NTEXT // Could be a CSV list of possible options (for multi-choice type questions)
QuestionTypeID INT // ID of the question type record (more for the user-interface)
TABLE QuestionTypes
QuestionTypeID UNIQUE INT
Description NVARCHAR(100)
TABLE Answers
AnswerID UNIQUE INT
UserID INT // Related User record
QuestionID INT // Related Question record
Answers NTEXT // Could this be a CSV list of answers??
I can see how the above would work for very simple, single or multiple choice answers but I'm not sure how it would work for different styles of question.
Another thought I had was having different tables handling different QuestionTypeID values, for example:
TABLE QuestionType1 // Table styled to question type 1
QuestionType1ID UNIQUE INT
QuestionID INT // Related question record
...
TABLE QuestionType2 // Table styled to question type 2
QuestionType2ID UNIQUE INT
QuestionID INT // Related question record
...
But I can't really see how this would work. Am I thinking about this or approaching it in the wrong way?
I'd appreciate any help (if the question makes sense in the first place)!
Regards,
K

Storing Exam Questions in a Database

I've been thinking about how I should design a database to hold exam questions for a little over a year now (on and off, mostly off).
First, a short description of what I'm after. I would like to design a database flexible enough to store different question types (for example, short response or multiple choice questions), and be able to select any number of those questions to be stored as an exam.
My question is:
How should the exam questions be stored?
Since different question types require different fields to store, if I was to put them all under the same table questions, there will be a lot of extra fields that are never used.
If I separate the question types into different tables, it'll be a lot harder to store the question_id in some exam_questions table, since they will come from different tables.
I also can't think of a flexible way to store the information.
For example,
questions
- id
- question
- type (multiple choice, short response)
- choice_1 ( used for multiple choice questions)
- choice_2
- choice_3
- choice_4
- choice_5
- answer (short response answer here, or just a/b/c/d/e for multiple choice, t/f for true or false)
Would a design like this be recommended? If not, does anyone have any advice?
I also have another question:
If I want to store student responses to one of these exams, would it be inefficient to store them like this?
exam_responses
- id
- student_id
- exam_id
- question_id or question_number
- response
- mark
Thank you in advance. If you would like any other information, or if there is anything wrong with this question, please leave me a comment and I'll try and have it fixed as soon as possible.
I would have separate question and answer tables and join them using a question2answer table
question
--------
- id
- question
- type (multiple choice, short response)
answer
------
- id
- GUIorder (so you can change the sort order on the front end)
- answer
question2answer
---------------
- questionid
- answerid
Now everything is dynamic in terms of building the question and you don't have empty columns.
A quick join brings you back all the parts to the main question
Your exam_responses table can now have the following
- id
- questionid
- answerid
- studentid
- response
- mark
I think storing five columns for each response in the questions table is not a good design. If the number of choices for the question becomes 6 you will have to add one more column. Also, if there are just two choices, the remaining columns will stay empty. So it better be in two separate tables:
questions
- id
- question
- type (multiple choice, short response)
- answer (short response answer here, or just a/b/c/d/e for multiple choice, t/f for true or false)
question_choices
- id
- question_id
- choice
Then you can get the list of choices for each particular question by joining them based on questions.id=question_chocies.question_id condition.
In case of exam responses, you should divide in two tables too in order not to repeat information about student_id, exam and mark for each question of the exam. So it should be something like:
student_exam
- id
- student_id
- exam_id
- mark
student_exam_responses
- exam_id
- question_id or question_number
- response
These two tables can be connected based on the student_exam.id=student_exam_responses.exam_id condition.

How to localize sql server data?

We have a requirement to develope an application that support multiple languages (English, German, French, Russian) and we know, we can use ASP.NET localization to localize static text of a web form but what would be the approach for data localization of a database in SQL server.
for example my database schema is something like this:
Table-Questions
QID-PK
Question
CreatedBy
Table- Answer
AID-PK
QID- FK
Answer
AddedBy
In the above schema,I want the column "question" from Question table and column "Answer" from Answer table should keep localization value.
How do I achive this.
Add a Language table:
LanguageID-PK
LanguageIdentifier (name as accepted by CultureInfo's constructor, e.g. "de" for German)
Add a TranslatedQuestion table:
TQID-PK
QID-FK
LanguageID
Translation
Likewise, add a TranslatedAnswer table:
TAID-PK
AID-FK
LanguageID
Translation
This way, of course there is nothing in the data model to guarantee that every question/answer has a transation for a given language. But you can always fall back to the untranslated question/answer.
Add a culture column to the table, then repeat the questions and answers in the culture specific format.

Internationalization in .NET. Database driven? Resources interactively?

I am working on the internationalization of a CMS in .NET (VB.NET). In the administration part we used resources, but for the clients we want a UI so someone can see the labels and translate them from one language to another.
So, the first thought was to do it database driven with three tables:
Label Translation Language
----- ----------- --------
id id id
name keyname_id name
filename language_id
value
And then create an UI so you can allow the client to first select the filename of the page you want to translate, the label, and then select the language he wants and translate it, and it would be stored in the translations table.
I see here a problem: How would I take from the page all the labels?
I also spotted an example of a resources manager that can translate in an interactive way. This is the example.
The benefits of this solution is that you are working with resources, so everything seems easier because some work is done. On the other hand, this structure can be more difficult to implement, I don't know as I'm not experienced on this.
So, what do you think about these two approaches? What is better for you? Maybe there is a third approach that is easier?
EDIT: I also found this link about Resource-provider model. What do you think about it? Maybe it can be useful,but I don't know, maybe it's too much for my purposes. I am thinking where to start
In LedgerSMB, we went with the following approach:
Application strings (and strings in code) get translated by a standard i18n framework (GNU gettext basically).
Business data gets manual translation in the database. So you can add translations to department names, project names,descriptions of goods and services etc.
Our approach to the problem you say is to join the other tables, so we might have:
CREATE TABLE parts (
id int primary key.... -- autoincrements but not relevant to this example
description text,
...
);
CREATE TABLE language (
code varchar(5) primary key, -- like en_US
name text unique,
);
CREATE TABLE parts_translation (
parts_id int not null references parts(id),
language_code varchar(5) not null references language(code),
translation text
);
Then we can query based on desired language at run time.

Recommended SQL database design for tags or tagging [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've heard of a few ways to implement tagging; using a mapping table between TagID and ItemID (makes sense to me, but does it scale?), adding a fixed number of possible TagID columns to ItemID (seems like a bad idea), Keeping tags in a text column that's comma separated (sounds crazy but could work). I've even heard someone recommend a sparse matrix, but then how do the tag names grow gracefully?
Am I missing a best practice for tags?
Three tables (one for storing all items, one for all tags, and one for the relation between the two), properly indexed, with foreign keys set running on a proper database, should work well and scale properly.
Table: Item
Columns: ItemID, Title, Content
Table: Tag
Columns: TagID, Title
Table: ItemTag
Columns: ItemID, TagID
Normally I would agree with Yaakov Ellis but in this special case there is another viable solution:
Use two tables:
Table: Item
Columns: ItemID, Title, Content
Indexes: ItemID
Table: Tag
Columns: ItemID, Title
Indexes: ItemId, Title
This has some major advantages:
First it makes development much simpler: in the three-table solution for insert and update of item you have to lookup the Tag table to see if there are already entries. Then you have to join them with new ones. This is no trivial task.
Then it makes queries simpler (and perhaps faster). There are three major database queries which you will do: Output all Tags for one Item, draw a Tag-Cloud and select all items for one Tag Title.
All Tags for one Item:
3-Table:
SELECT Tag.Title
FROM Tag
JOIN ItemTag ON Tag.TagID = ItemTag.TagID
WHERE ItemTag.ItemID = :id
2-Table:
SELECT Tag.Title
FROM Tag
WHERE Tag.ItemID = :id
Tag-Cloud:
3-Table:
SELECT Tag.Title, count(*)
FROM Tag
JOIN ItemTag ON Tag.TagID = ItemTag.TagID
GROUP BY Tag.Title
2-Table:
SELECT Tag.Title, count(*)
FROM Tag
GROUP BY Tag.Title
Items for one Tag:
3-Table:
SELECT Item.*
FROM Item
JOIN ItemTag ON Item.ItemID = ItemTag.ItemID
JOIN Tag ON ItemTag.TagID = Tag.TagID
WHERE Tag.Title = :title
2-Table:
SELECT Item.*
FROM Item
JOIN Tag ON Item.ItemID = Tag.ItemID
WHERE Tag.Title = :title
But there are some drawbacks, too: It could take more space in the database (which could lead to more disk operations which is slower) and it's not normalized which could lead to inconsistencies.
The size argument is not that strong because the very nature of tags is that they are normally pretty small so the size increase is not a large one. One could argue that the query for the tag title is much faster in a small table which contains each tag only once and this certainly is true. But taking in regard the savings for not having to join and the fact that you can build a good index on them could easily compensate for this. This of course depends heavily on the size of the database you are using.
The inconsistency argument is a little moot too. Tags are free text fields and there is no expected operation like 'rename all tags "foo" to "bar"'.
So tldr: I would go for the two-table solution. (In fact I'm going to. I found this article to see if there are valid arguments against it.)
If you are using a database that supports map-reduce, like couchdb, storing tags in a plain text field or list field is indeed the best way. Example:
tagcloud: {
map: function(doc){
for(tag in doc.tags){
emit(doc.tags[tag],1)
}
}
reduce: function(keys,values){
return values.length
}
}
Running this with group=true will group the results by tag name, and even return a count of the number of times that tag was encountered. It's very similar to counting the occurrences of a word in text.
Use a single formatted text column[1] for storing the tags and use a capable full text search engine to index this. Else you will run into scaling problems when trying to implement boolean queries.
If you need details about the tags you have, you can either keep track of it in a incrementally maintained table or run a batch job to extract the information.
[1] Some RDBMS even provide a native array type which might be even better suited for storage by not needing a parsing step, but might cause problems with the full text search.
I've always kept the tags in a separate table and then had a mapping table. Of course I've never done anything on a really large scale either.
Having a "tags" table and a map table makes it pretty trivial to generate tag clouds & such since you can easily put together SQL to get a list of tags with counts of how often each tag is used.
I would suggest following design :
Item Table:
Itemid, taglist1, taglist2
this will be fast and make easy saving and retrieving the data at item level.
In parallel build another table:
Tags
tag
do not make tag unique identifier and if you run out of space in 2nd column which contains lets say 100 items create another row.
Now while searching for items for a tag it will be super fast.