Designing Database to Support Multi-Language - sql

I am attempting to introduce multi-language support into the back end of an application, and trying to figure out a strategy to efficiently implement this into the current schema.
I currently have multiple tables, all of which include standard english values.
My idea is in each table w/ english values, include a foreign key that relates to a single "Language" and multiple "Language values" in the table holding translations
So for ex.
Table-1
Value | Lang_ID
"This is a sentence" | 1
"This is also a sentence" | 2
"Translate this" | 3
Table-2
Value | Lang_ID
"This is a sentence from another table" | 4
"This table is different from table-1" | 5
Language-Table
Lang_ID | Lang_Code | Value
1 | "ZHO" | "这是一句话"
1 | "SPA" | "esta es una frase"
2 | "FRA" | "c'est aussi une phrase"
3 | "SPA" | "traduce esto"
4 | "FRA" | "ceci est une phrase d'un autre tableau"
....
My thinking is you would then just need to query Language-Table by WHERE Lang_ID=? AND Lang_Code=? to get the translation for that specific value.
I'm wondering a couple things
1) Is this a good practice?
2) How do I generate foreign keys that don't exist yet over multiple tables and keep each one unique (so as not to have two translations of different text values fall under the same Lang_ID)

Using another table to store the translations is a good idea. I have used it in my work as much as for personal projects. In my case, I designed it translation opened from the begining therefore I have no string value in my tables, even the default english values are in the translation table.
I have also put the language codes in a separate table, though I have never found a situation where I needed it, therefore I suggest you keep them where they are.
For the unique id, I suggest using a sequence like so (shamelessly stolen here):
create sequence sequence_name
start 1
increment 1
NO MAXVALUE
CACHE 1;
And use next_val('sequence_name') whenever inserting a new field to translate.

Related

Open sums in SQL / dynamic selection of tables

Much ink has been spilled on the topic of sum types in SQL. The standard solutions are called absorption, separation, and partition; see, e.g.: https://www.inf.unibz.it/~montali/teaching/1415/dpm/slides/4.relational-mapping.pdf .
I want to ask about how to encode open sums. Normal sums allow a field to be one of a fixed set of several different types; with open sums, this set is not fixed.
The basic setup in our program: There is a list of "triggers," where each trigger can be one of many different things. Plugins can be written defining new trigger types, although the set of trigger types can be assumed to be known at compile time.
We want a table of all triggers.
Our current best idea:
Dynamically create a materialized view of the following form:
id | id_in_plugin_table | thing_in_main_program_it_refs | plugin_name
---------------------------------------------------------------------
1 | 27 | 8 | RegexTrigger
2 | 27 | 12 | RidiculouslyUnsafeCustomJSTrigger
This relation is automatically generated from the various plugin tables, each of which have their own ID and a thing_in_main_program_it_refs field.
For illustration, here's what the referenced tables may look like.
RegexTrigger table:
id | thing_in_main_program_it_refs | regex
---------------------------------------------------------------------
27 | 8 | hel*o
RidiculouslyUnsafeCustomJSTrigger
id | thing_in_main_program_it_refs | custom_js
---------------------------------------------------------------------
27 | 12 | (x) => isPrime(x.length())
Either use two roundtrips to lookup the plugin table and then query it, or combine them into a single SQL program which uses EXEC.
I'm happy with part 1, but not with part 2. Neither option sounds efficient, and the latter option uses EXEC.
So, we're looking for either (a) a better way to dynamically select a table in a query, or (b) a different approach to open sums.

Transpose variable number of rows into columns in OpenRefine

I have an xml file containing records from a library catalogue. I have imported it into OpenRefine but all the values are in one column. I want to transpose it so each field in the record has its own column. However, this is complicated by the fact that a) each field is optional so does not exist in all records and b) many fields are repeatable so can appear multiple times in each record. Here's a simplified example of what the data looks like:
| RecordID | Tag | Data |
| 1 | 040a | CaABCD |
| 1 | 245a | Go fish |
| 1 | 245a | A guide to fish |
| 1 | 246i | Fish series |
| 1 | 260a | Fishing friends |
| 2 | 040a | CaABDC |
| 2 | 245a | Happy trails |
| 2 | 246i | Hiking series |
| 2 | 260i | The happy hiker |
| 2 | 500a | Notes |
I have read the Q&A here Openrefine - Transpose rows into columns based on text but the problem with this solution is that if I concatenate all the values together I have no way to be sure what field they belong in anymore, as my data is much more complicated than the data in that question (my actual data has 25+ fields and many thousands of records).
I was able to get closer using Google Sheets and making a pivot table with a calculated field (as in PivotTable to show values, not sum of values - see the answer at the very bottom). However, I still don't know how to handle the repeating fields. In the pivot table the multiple values are there but only the first displays (double-clicking on an individual cell brings up a details table which lists all the values), so when I copy-paste the table I lose the additional values. I would like to concatenate them but I cannot see a way to do so within the pivot table.
Can you think of any other way I could do this, in OpenRefine or another tool? Thanks!
The classic way to fix this in OpenRefine is to use "Transpose -> Columnize by key value". But this feature is poorly documented and can cause headaches even for OpenRefine developers. In your case, repeated fields will be problematic, so here is a possible solution.
1° Go to the "tag" column, click on "Transpose -> Columnize by key value" and use the following configuration (don't forget the "Note column (optional)")
The result will look like this (my dataset is not exactly the same as yours, I modified a value to do some test)
2° In the new column "Record ID: 040 a", click on "edit column -> Move Column To Beginning".
3° If you want to merge the repeated fields, go to each column that contains them and click on "Edit Cells -> Join Multi Value cells" by choosing a separator, for example "|".
The end result will look like this.
To get rid of unnecessary columns: Click on Export -> Custom tabular export and deselect the columns whose name starts with RecordId.
OpenRefine also has a native MARC importer which might be something worth trying if you need to work with MARC data in the future. MARCEdit also has some specific OpenRefine support built in.

sqlite variable and unknown number of entries in column

I am sure this question has been asked before, but I'm so new to SQL, I can't even combine the correct search terms to find an answer! So, apologies if this is a repetition.
The db I'm creating has to be created at run-time, then the data is entered after creation. Some fields will have a varying number of entries, but the number is unknown at creation time.
I'm struggling to come up with a db design to handle this variation.
As an (anonymised) example, please see below:
| salad_name | salad_type | salad_ingredients | salad_cost |
| apple | fruity | apple | cheap |
| unlikely | meaty | sausages, chorizo | expensive |
| normal | standard | leaves, cucumber, tomatoes | mid |
As you can see, the contents of "salad_ingredients" varies.
My thoughts were:
just enter a single, comma-separated string and separate at run-time. Seems hacky, and couldn't search by salad_ingredients!
have another table, for each salad, such as "apple_ingredients", which could have a varying number of rows for each ingredient. However, I can't do this, because I don't know the salad_name at creation time! :(
Have a separate salad_ingredients table, where each row is a salad_name, and there is an arbitrary number of ingredients fields, say 10, so you could have up to 10 ingredients. Again, seems slightly hacky, as I don't like to unused fields, and what happens if a super-complicated salad comes along?
Is there a solution that I've missed?
Thanks,
Dan
based on my experience the best solution is based on a normalized set of tables
table salads
id
salad_name
salad_type
salad_cost
.
table ingredients
id
name
and
table salad_ingredients
id
id_salad
id_ingredients
where id_salad is the corresponding if from salads
and id_ingredients is the corresponding if from ingredients
using proper join you can get (select) and filter (where) all the values you need

Storing a COUNT of values in a table

I have a table with data along the (massively simplified) lines of:
User | Value
-----|------
UsrA | 100
UsrA | 102
UsrB | 100
UsrA | 100
UsrB | 101
and, for reasons far to obscure to go into, I need to store the COUNT of each value in a table for future retrieval - ending up with something like
User | Value100Count | Value101Count | Value102Count
-----|---------------|---------------|--------------
UsrA | 2 | 0 | 1
UsrB | 1 | 1 | 0
However, there could be up to 255 different Values - meaning potentially 255 different ValueXCount columns. I know this is a horrible way to do things, but is there an easy way to get the data into a format that can be easily INSERTed into the destination table? Is there a better way to store the COUNT of values per user (unfortunately I do need to store this information; grabbing it from the source table each time isn't an option)?
The whole thing isn't very pretty, but you know that, rather than your table with 255 columns I'd consider setting up another table with:
User | Value | CountOfValue
And set a primary key over User and Value.
You could then insert the count's for given user/value combos into the CountOfValue field
As I said, the design is horrible and it feels like you would be better off starting from scratch, normalizing and doing counts live.
Check out indexed views. You can maintain the table automatically, with integrity and as a bonus it can get used in queries that already do count(*) on that data.

MySQL Database Design with Internationalization

I'm going to start work on a medium sized application, and i'm planning it's db design.
One thing that I'm not sure about is this.
I will have many tables which will need internationalization, such as: "membership_options, gender_options, language_options etc"
Each of these tables will share common i18n fields, like:
"title, alternative_title, short_description, description"
In your opinion which is the best way to do it?
Have an i18n table with the same fields for each of the tables that will need them?
or do something like:
Membership table Gender table
---------------- --------------
id | created_at id | created_at
1 - 22.03.2001 1 - 14.08.2002
2 - 22.03.2001 2 - 14.08.2002
General translation table
-------------------------
record_id | table_name | string_name | alternative_title| .... |id_language
1 - membership regular null 1 (english)
1 - membership normale null 2 (italian)
1 - gender man null 1(english)
1 -gender uomo null 2(italian)
This would avoid me repeating something like:
membership_translation table
-----------------------------
membership_id | name | alternative_title | id_lang
1 regular null 1
1 normale null 2
gender_translation table
-----------------------------
gender_id | name | alternative_title | id_lang
1 man null 1
1 uomo null 2
and so on, so i would probably reduce the number of db tables, but i'm not sure about performance.I'm not much of a DB designer, so please let me know.
The most common way I've seen this done is with two tables, membership and membership_ml, with one storing the base values and the ml table storing the localized strings. This is similar to your second option. Most of the systems I see like this are made that way because they weren't designed with internationalization in mind from the get go, so the extra _ml tables were "tacked on" later.
What I think is a better option is similar to your first option, but a little bit different. You would have a central table for storing all the translations, but instead of putting the table name and field name in there, you would use tokens and a central "Content" table to store all the translations. That way you can enforce some kind of RI between the tokens in the base table and the translations in the Content table if you want as well.
I actually asked a question about this very thing a while back, so you can have a look at that for some more info (rather than repasting the schema examples here).
I also think the best solution is to keep translations on different table. This approach use Open Cart which is open source and you can take a look the way it deals with the problem. Another source of information is here "http://www.gsdesign.ro/blog/multilanguage-database-design-approach/" especially on the comments sections