I am working on the internationalization of a CMS in .NET (VB.NET). In the administration part we used resources, but for the clients we want a UI so someone can see the labels and translate them from one language to another.
So, the first thought was to do it database driven with three tables:
Label Translation Language
----- ----------- --------
id id id
name keyname_id name
filename language_id
value
And then create an UI so you can allow the client to first select the filename of the page you want to translate, the label, and then select the language he wants and translate it, and it would be stored in the translations table.
I see here a problem: How would I take from the page all the labels?
I also spotted an example of a resources manager that can translate in an interactive way. This is the example.
The benefits of this solution is that you are working with resources, so everything seems easier because some work is done. On the other hand, this structure can be more difficult to implement, I don't know as I'm not experienced on this.
So, what do you think about these two approaches? What is better for you? Maybe there is a third approach that is easier?
EDIT: I also found this link about Resource-provider model. What do you think about it? Maybe it can be useful,but I don't know, maybe it's too much for my purposes. I am thinking where to start
In LedgerSMB, we went with the following approach:
Application strings (and strings in code) get translated by a standard i18n framework (GNU gettext basically).
Business data gets manual translation in the database. So you can add translations to department names, project names,descriptions of goods and services etc.
Our approach to the problem you say is to join the other tables, so we might have:
CREATE TABLE parts (
id int primary key.... -- autoincrements but not relevant to this example
description text,
...
);
CREATE TABLE language (
code varchar(5) primary key, -- like en_US
name text unique,
);
CREATE TABLE parts_translation (
parts_id int not null references parts(id),
language_code varchar(5) not null references language(code),
translation text
);
Then we can query based on desired language at run time.
Related
I'm new to SQL and I'm currently thinking about an effective way to build out my database. It's a language learning application and I'm torn between two approaches:
Keeping all of my words, regardless of their language, in one giant words table
Splitting my words into separate tables based on their language, ie: words_french, words_italian, etc.
In the second scenario, are there approaches that I can use (perhaps within Postgres) that would allow me target the words_french table in the event that I'm currently working through french lessons / content and need to lookup associated french words?
I feel like there would be some sort of concat process like so: words_${language} and as of this moment I'd figure i'd have to resolve this within JS or something else on the frontend.
-- also, is breaking words and other content into their respective table_language even a valid approach?
Any ideas?
Use Option 1. Option 2 would be horribly difficult to work with.
Word table:
WordId
Word
Language
1
a
English
2
un
French
As Dimitar Spasovski suggests, if you have a need for additional attributes associated with the language, you should also have a Language table. Then replace the Language column in the Word with LanguageId to make the relationship.
Watching or reading some data modeling or data architecture classes online will help.
Is there any documentation available to get the list of available SQL datatypes used in the service.xml?
What value should I use to make the configuration compatible with the following table structure:
CREATE TABLE SAMPLE_TABLE (
NAME varchar(100) NOT NULL,
DESCRIPTION varchar(300) DEFAULT NULL,
CREATE_DATE timestamp
)
It always feels funny to link to the service-builder DTD (here the HTML version - just see the declaration at the top of your service.xml for the actual file), but you can actually learn a lot from it: It contains way more documentation than actual DTD code, so it's human readable for anybody dealing with service builder. In the DTD you'll find the attributes to declare that you want to connect to an existing table instead of creating one through servicebuilder default mechanisms.
There's another file, which AFAIK sadly does not have a DTD: model-hints.xml. However, luckily there's a developer guide chapter on it: This contains quite some extra information, like: max-length of VARCHAR fields etc. . Use this file for more validation or to specify more details for the columns that get autogenerated into your entity tables.
I'm looking for some advice on SQL naming conventions. I know this topic has been discussed before but my question is a little more specific and I cannot find an answer elsewhere.
I have some integer variables - generally they would have a name like 'Timeout'. Is there an adopted standard prefixing/suffixing the value so that I know what it contains when I come back to it in 6 months time?
For instance is it 'TimeoutMilliseconds'.
I'm not talking about labelling every variable this way, just those with generic values.
Lookup ISO-11179 for the international database naming standard. for this you can grab this online for free download (though sorry I forget where). There is a lot in it, so here are some some basic summary form it:
Take your field description, remove joining words and write it backwards.
Always end with a class name. There are standard abbreviations like ID for identifier and such.
eg:
Date of Entry:
Entry_Date
Seconds_For_Delivery:
Delivery_Seconds
Name of Widget:
Widget_Name
Location of Widget:
Widget_Location
Size of Widget:
Widget_Size
Also a field should have the same name if it is a primary key or a referenced foreign key. This will pay off in readability for people that come after you, and also most DB tools will assume they are matching keys so you will also save time in using reporting tools and the like (less manual stuffing around putting links in by hand).
In the above examples, the class names are date, seconds, name, location, size. It surprises me that this ISO is not more well known.
What I'm looking for is a breakdown of table names w/ corresponding fields/types.
The bible I want to store will be in English and needs to support the following:
Books
Chapters
Section Titles (can show up within verses and in-between verses)
Smallcaps Text
Red Letter Text
Verse Numbers
Footnotes (can show up within verses and within section titles) (may optionally reference another verse)
Cross-references (essentially a footnote that only references another verse and doesn't add any commentary)
Anything else I'm forgetting
Here is another collection / example for you:
https://github.com/scrollmapper/bible_databases
Here you will see SQL, XML, CSV, and JSON. Of special note is the cross-reference table (quite extensive and amazing) and a simple verse-id system for fast queries.
EDIT: Notice the ids of the tables are book-chapter-verse combinations, always unique.
SQL is the BEST way to store this. Considering your requirement we can split them into two major parts
Information that's dependent on individual version
Small caps
Red letter print
Information that isn't dependent on individual version
Book, Chapter, Verse numbers
Section title
Foot notes (??????)
Cross Reference
Commentary
For various reasons I prefer to store the whole bible project into one SINGLE table, Yes call it as bible
For your visual here is my screen I have stored nearly 15 versions of bible in single table. Luckily the different version names are just kept as column wide. When you add more version in future your table grows horizontally which is okay thus the # of rows remain constant(31102). Also, I will request you to realize the convenience of keeping the combination of ('Book, Chapter, Verse') as the PRIMARY key because in most situations that's the look-up way.
That said here is my recommended table structure.
CREATE TABLE IF NOT EXISTS `bible` (
`id` int(11) NOT NULL AUTO_INCREMENT, --Global unique number or verse
`book` varchar(25) NOT NULL, --Book, chapter, verse is the combined primary key
`chapter` int(11) NOT NULL,
`verse` int(11) NOT NULL,
`section_title` varchar(250) NOT NULL, -- Section title, A section starts from this verse and spans across following verses until it finds a non-empty next section_title
`foot_note` varchar(1000) NOT NULL, -- Store foot notes here
`cross_reference` int(11) NOT NULL, -- Integer/Array of integers, Just store `id`s of related verses
`commentary` text NOT NULL, -- Commentary, Keep adding more columns based on commentaries by difference authors
`AMP` text NOT NULL, -- Keep, keep, keep adding columns and good luck with future expansion
`ASV` text NOT NULL,
`BENG` text NOT NULL,
`CEV` text NOT NULL,
PRIMARY KEY (`book`,`chapter`,`verse`),
KEY `id` (`id`)
)
Oh, What about the Small caps and Red letters?
Well, Small caps & Red letters you can store in version columns using HTML or appropriate formats. In the interface you can strip them off based on user's choice whether he requires red letter or small caps.
For reference, you can download the SQLs from below and customize in your way
Bibles in SQL format
Rather than reinventing the wheel, you might consider using a "Bible SDK" such as AV Bible, which stores text, formatting, verse numbers, etc. in an open, custom binary format.
I think they have everything you've listed except cross-references.
I also found http://www.lyricue.org/downloads/ that includes several bible translations in mysql format.
This repository contains entire Bible given in sql.
https://github.com/godlytalias/Bible-Database
Everything WernerCD's answer, but store the verseText as xml so you can add formatting tags like <red>e.g. Red Text</red> and use the tags to format it in your application
Mark Rushakoff's answer is probably the best for you specific need. However more generally if need to store content that either has data within the content or if you need to store data about the content a Content Management System is typically used. You can build your own (which WernerCD's answer had a table structure for) or use a CMS product. The list here shows the wide variety of technologies used (around 30 in this list use MySQL)
expanding the DB horizontally isn't very efficient with the potential of having very large tables and complex updates. so id, book, chapter, verse, V1, V2, V3, V4... Vn just seems to be looking at the problem like a spreadsheet rather than taking advantage of what a DB has to offer.
the references are static (book, chapter, verse) so they can be populated in one table with an id and with that you have the framework of the entire bible. the verse content can potentially have hundreds of versions so it would be better stored in its own table and linked with a foreign key to identifying the references. the structure would be primary_id, foreign_id, version, content.
now the content just fills in as needed and there is no need to have thousands of empty fields that in the future you have to go back and fill in or needing to expand the table and backfill all the existing data every time you add a new version. just fill in the verses as you get them which works better I think if you building it yourself.
This also makes sense as some versions only have the NT or some verses that they think were added later aren't available so there is no need to have empty fields you just have the data and it links to the verse reference. "version" can also be a foreign key to identify more information in the version like a publishing date or long/short name (ie. "NIV", "New International Version") This also works well when using more than one revision of a translation like the 1984 NIV vs 2011 NIV. Both can be identified as "NIV" but differ in content so the version_id can link another table with expanded information about the version it's using. Once that data is in and linked properly you can display it however you wish for example combining the publishing date/short version name making a name like "NIV1984" or have a separate column unique for a display name.
I'm not sure how red letter or footnotes could be displayed and I know sites like biblegateway have this as a toggle switch so it's nice to have the option to sort it like this. with red letters, this could be a special static identifier directly in the verse content that is parsed out later as a CSS identifier. It could be its own foreign table too but since it is so little a delimiter would be really easy. It really depends what you're using the data for and if you wanted to do queries for the red letters then it would be best as a foreign table (fast) rather then search the db for the delimiter (slow)
with footnotes, since it depends on unique content it would be best stored in its own table. how it is identified and placed in the content could use static reference points within the content like x number of characters in or x number of words in and then linked with the content using a foreign key again. So the footnote table could be something like primary_id, foreign_id, reference, footnote and an example of the data could be 2304, 452, 64, "some manuscripts don't include this". Primary key would be 2304, the foreign key that links to the content table is 452, the footnote is placed 64 characters in, and the footnote is "some manuscripts don't include this" As for the incrementing footnote like A, B, C or 1, 2, 3 all of this can be dynamically generated. If it's important to be a static letter/number then you might want to include it in the data but I would rather have good data that allows this automatically then list it as static data.
here's the hint, Don't add hundreds of columns... that would just a headache and it's spreadsheet thinking. it's better to work through the perfect queries to link tables with the right content.
I am trying to determine what the best way is to find variations of a first name in a database. For example, I search for Bill Smith. I would like it return "Bill Smith", obviously, but I would also like it to return "William Smith", or "Billy Smith", or even "Willy Smith". My initial thought was to build a first name hierarchy, but I do not know where I could obtain such data, if it even exists.
Since users can search the directory, I thought this would be a key feature. For example, people I went to school with called me Joe, but I always go by Joseph now. So, I was looking at doing a phonetic search on the last name, either with NYSIIS or Double Metaphone and then searching on the first name using this name heirarchy. Is there a better way to do this - maybe some sort of graded relevance using a full text search on the full name instead of a two part search on the first and last name? Part of me thinks that if I stored a name as a single value instead of multiple values, it might facilitate more search options at the expense of being able to address a user by the first name.
As far as platform, I am using SQL Server 2005 - however, I don't have a problem shifting some of the matching into the code; for example, pre-seeding the phonetic keys for a user, since they wouldn't change.
Any thoughts or guidance would be appreciated. Countless searches have pretty much turned up empty. Thanks!
Edit: It seems that there are two very distinct camps on the functionality and I am definitely sitting in the middle right now. I could see the argument of a full-text search - most likely done with a lack of data normalization, and a multi-part approach that uses different criteria for different parts of the name.
The problem ultimately comes down to user intent. The Bill / William example is a good one, because it shows the mutation of a first name based upon the formality of the usage. I think that building a name hierarchy is the more accurate (and extensible) solution, but is going to be far more complex. The fuzzy search approach is easier to implement at the expense of accuracy. Is this a fair comparison?
Resolution: Upon doing some tests, I have determined to go with an approach where the initial registration will take a full name and I will split it out into multiple fields (forename, surname, middle, suffix, etc.). Since I am sure that it won't be perfect, I will allow the user to edit the "parts", including adding a maiden or alternate name. As far as searching goes, with either solution I am going to need to maintain what variations exists, either in a database table, or as a thesaurus. Neither have an advantage over the other in this case. I think it is going to come down to performance, and I will have to actually run some benchmarks to determine which is best. Thank you, everyone, for your input!
In my opinion you should either do a feature right and make it complete, or you should leave it off to avoid building a half-assed intelligence into a computer program that still gets it wrong most of the time ("Looks like you're writing a letter", anyone?).
In case of human names, a computer will get it wrong most of the time, doing it right and complete is impossible, IMHO. Maybe you can hack something that does the most common English names. But actually, the intelligence to look for both "Bill" and "William" is built into almost any English speaking person - I would leave it to them to connect the dots.
The term you are looking for is Hypocorism:
http://en.wikipedia.org/wiki/Hypocorism
And Wikipedia lists many of them. You could bang out some Python or Perl to scrape that page and put it in a db.
I would go with a structure like this:
create table given_names (
id int primary key,
name text not null unique
);
create table hypocorisms (
id int references given_names(id),
name text not null,
primary key (id, name)
);
insert into given_names values (1, 'William');
insert into hypocorisms values (1, 'Bill');
insert into hypocorisms values (1, 'Billy');
Then you could write a function/sproc to normalize a name:
normalize_given_name('Bill'); --returns William
One issue you will face is that different names can have the same hypocorism (Albert -> Al, Alan -> Al)
I think your basic approach is solid. I don't think fulltext is going to help you. For seeding, behindthename.com seems to have large amount of the data you want.
Are you using SQl Server 2005 Express with Advanced Services as to me it sounds you would benefit from the Full Text indexing and more specifically Contains and Containstable which you can use with specific instructions here is a link for the uses of Containstable:
http://msdn.microsoft.com/en-us/library/ms189760.aspx
and here is the download link for SQL Server 2005 With Advanced Services:
http://www.microsoft.com/downloads/details.aspx?familyid=4C6BA9FD-319A-4887-BC75-3B02B5E48A40&displaylang=en
Hope this helps,
Andrew
You can use the SQL Server Full Text Search and do an inflectional search.
Basically like:
SELECT ProductId, ProductName
FROM ProductModel
WHERE CONTAINS(CatalogDescription, ' FORMSOF(THESAURUS, metal) ')
Check out:
http://en.wikipedia.org/wiki/SQL_Server_Full_Text_Search#Inflectional_Searches
http://msdn.microsoft.com/en-us/library/ms345119.aspx
http://www.mssqltips.com/tip.asp?tip=1491
Not sure what your application is, but if your users know at the time of sign up that people from their past might be searching the database for them, you could offer them the chance in the user profile to define other names they might be known as (including last names, women change these all the time and makes finding them much harder!) and that they want people to be able to search on. Store these in a separate related table. Then search on that. Just make the structure such that you can define one name as the main name (the one you use for everything except the search.)
You'll find that you're dabbling in an area known as "Natural Language Processing" and you'll need to do several things, most of which can be found under the topic of stemming.
Simplistic stemming simply breaks the word apart, but more advanced algorithms associate words that mean the same thing - for instance Google might use stemming to convert "cat" and "kitten" to "feline" and search for all three, weighing the actual word provided by the user as slightly heavier so exact matches return before stemmed matches.
It's a known problem, and there are open source stemmers available.
-Adam
No, Full Text searches will not help to solve your problem.
I think you might want to take a look at some of the following links: (Funny, no one mentioned SoundEx till now)
SoundEx - MSDN
SoundEx - Google results
InformIT - Tolerant Search algorithms
Basically SoundEx allows you to evaluate the level of similarity in similar sounding words. The function is also available on SQL 2005.
As a side issue, instead of returning similar results, it might prove more intuitive to the user to use a AJAX based script to deliver similar sounding names before the user initiates his/her search. That way you can show the user "similar names" or "did you mean..." kind of data.
Here's an idea for automatically finding "name synonyms" like Bill/William. That problem has been studied in the broader context of synonyms in general: inducing them from statistics of which words commonly appear in the same contexts in a large text corpus like the Web. You could try combining that approach with a list of names like Moby Names; I don't know if it's been done before.
Here are some pointers.