SQL / (Django) : efficient database schema for translations - sql

Situation
I trying to set up a database schema to store translations, between different languages. So far it looks like this (simplyfied):
class Language(models.Model):
tag = models.CharField(max_length=2)
def __unicode__(self):
return self.tag
class Phrase(models.Model):
name = models.TextField()
language = models.ForeignKey(Language)
def __unicode__(self):
return self.name
class Meta:
unique_together = ("name", "language")
index_together = [
["name", "language"]
]
class Translation(models.Model):
phrase1 = models.ForeignKey(Phrase, related_name="translation_as_1")
phrase2 = models.ForeignKey(Phrase, related_name="translation_as_2")
def __unicode__(self):
return self.phrase1.name + " <=> " + self.phrase2.name
class Meta:
unique_together = ("phrase1", "phrase2")
index_together = [
["phrase1", "phrase2"]
]
This database schema seems logical to me. I store phrases in different languages and then have translations that contain exactly two phrases.
Problem
The problem is, that the queries, that result out of this schema, look kind of nasty. For instance:
from django.db.models import Q
name = "my phrase"
translations = Translation.objects.filter(Q(phrase1__name=text)|Q(phrase2__name=text))
translated_names = []
for translation in translations:
name1 = translation.phrase1.name
name2 = translation.phrase2.name
if name1 == name:
translated_names.append(name2)
else:
translated_names.append(name1)
I always have to include the "OR" relationship, to make sure, that I get all the possible translations, since the phrase could be stored as phrase1 or phrase2. On top of that, I have to filter my result afterwards to get the correct translated_name (for loop).
Further Explaination
Before I switched to the described schema, I had the following schema instead (Phrase and Language are the same as before):
class Translation(models.Model):
phrase = models.ForeignKey(Phrase)
name = models.TextField()
def __unicode__(self):
return self.phrase.name + " => " + self.name
class Meta:
unique_together = ("phrase", "name")
index_together = [
["phrase", "name"]
This schema let me make queries like this:
from django.db.models import Q
name = "my phrase"
translations = Translation.objects.filter(phrase__name=text)
translated_names = [t.name for t in translations]
This looks much nicer, and is of course faster. But this schema had the disadvantage, that it presents translations only in one direction, so I moved to the other one, which isn't quite what I want as well, because too slow and too complicated queries.
Question
So is there a good schema for this kind of problem, that I maybe overlook?
Remark
I'm not only interested in Django related answers. A pure SQL schema for this kind of problem would also be interesting for me.

This is the way that I have done it in the past. Adapt it for your naming convention.
Suppose that I had a table with a name and other columns in it like this
Table TR_CLT_clothing_type
clt_id | clt_name | other columns ....
--------------------------------------
1 | T Shirt ...
2 | Pants ...
Now if I decided that it needs translations, first I make a languages table
Table TR_LNG_language
lng_id | lng_name | lng_display
-------------------------------
1 | English | English (NZ)
2 | German | Deutsch
I also need to store the current language in the database (you will see why soon). It will only have one row
Table TA_INF_info
inf_current_lng
---------------
1
Then I drop the clt_name column from my clothing table TR_CLT_clothing_type. Instead I make relation table.
Table TL_CLT_clothing_type
clt_id | lng_id | clt_name
--------------------------
1 | 1 | T Shirt
1 | 2 | (German for T-Shirt)
2 | 1 | Pants
2 | 2 | keuchen (thank you google translate)
Now to get the name, you want to make a stored procedure for it. I have not attempted this in ORM.
CREATE PROCEDURE PS_CLT
#clt_id int
AS
SELECT lng.clt_name, clt.*
FROM TR_CLT_clothing_type clt
JOIN TL_CLT_clothing_type lng
ON lng.clt_id = clt.clt_id
WHERE clt.clt_id = #clt_id AND
lng.lng_id in (SELECT inf_current_lng FROM TA_INF_info)
This stored proc will return the name in the current language and all other columns for a specified language. To set the language, set the clt_current_lng in the TA_INF_info table.
Disclaimer: I don't have anything to check the syntax of what I have typed but it should hopefully be straightforward.
-- EDIT
There was a concern to be able to do "give me all translations for word X in language Y to language Z"
There is a "not so elegant" way to do this with the schema. You can do something like
for each table in database like "TL_%"
SELECT name
FROM table
WHERE id IN ( SELECT id
FROM table
WHERE name = #name
AND lng_id = german
)
AND lng_id = english
Now I would imagine that this would require some auto-generated SQL code but I could pull it off.
I have no idea how you would do this in ORM

Related

Keep a relation map in Objection.js while removing the table

I'm developing a reddit-like site where votes are stored per-user (instead of per-post). Here's my relevant schema:
content
id | author_id | title | text
---|-----------|-------------|---
1 | 1 (adam) | First Post | This is a test post by adam
vote: All the votes ever voted by anyone on any post
id | voter_id | content_id | category_id
---|-------------|------------------|------------
1 | 1 (adam) | 1 ("First Post") | 1 (upvote)
2 | 2 (bob) | 1 ("First Post") | 1 (upvote)
vote_count: Current tally ("count") of total votes received by a post by all users
id | content_id | category_id | count
---|------------------|--------------|-------
1 | 1 ("First Post") | 1 (upvote) | 2
I've defined a voteCount relation in Objection.js model for the content table:
class Content extends Model {
static tableName = 'content';
static relationMappings = {
voteCount: {
relation: Model.HasManyRelation,
modelClass: VoteCount,
join: {
from: 'content.id',
to: 'vote_count.content_id'
}
}
}
}
But I recently (learned and) decided that I don't need to keep (and update) a separate vote_count table, when in fact I can just query the vote table and essentially get the same table as a result:
SELECT content_id
, category_id
, COUNT(*) AS count
FROM vote
GROUP
BY content_id
, category_id
So now I wanna get rid of the vote_count table entirely.
But it seems that would break my voteCount relation since there won't be a VoteCount model (not shown here but it's the corresponding the model for the vote_count table) no more either. (Right?)
How do I keep voteCount relation while getting rid of vote_count table (and thus VoteCount model with it)?
Is there a way to somehow specify in the relation that instead of looking at a concrete table, it should look at the result of a query? Or is it possible to define a model class for the same?
My underlying database in PostgreSQL if that helps.
Thanks to #Belayer. Views were exactly the solution to this problem.
Objection.js supports using views (instead of table) in a Model class, so all I had to do was create a view based on the above query.
I'm also using Knex's migration strategy to create/version my database, and although it doesn't (yet) support creating views out of the box, I found you can just use raw queries:
module.exports.up = async function(knex) {
await knex.raw(`
CREATE OR REPLACE VIEW "vote_count" AS (
SELECT content_id
, category_id
, COUNT(*) AS count
FROM vote
GROUP
BY content_id
, category_id
)
`);
};
module.exports.down = async function(knex) {
await knex.raw('DROP VIEW "vote_count";');
};
The above migration step replaces my table vote_count for the equivalent view, and the Objection.js Model class for it (VoteCount) worked as usual without needing any change, and so did the relation voteCount on the Content class.

Efficiently return words that match, or whose synonym(s), match a keyword

I have a database of industry-specific terms, each of which may have zero or more synonyms. Users of the system can search for terms by keyword and the results should include any term that contains the keyword or that has at least one synonym that contains the keyword. The result should then include the term and ONLY ONE of the matching synonyms.
Here's the setup... I have a term table with 2 fields: id and term. I also have a synonym table with 3 fields: id, termId, and synonym. So there would data like:
term Table
id | term
-- | -----
1 | dog
2 | cat
3 | bird
synonym Table
id | termId | synonym
-- | ------ | --------
1 | 1 | canine
2 | 1 | man's best friend
3 | 2 | feline
A keyword search for (the letter) "i" should return the following as a result:
id | term | synonym
-- | ------ | --------
1 | dog | canine <- because of the "i" in "canine"
2 | cat | feline <- because of the "i" in "feline"
3 | bird | <- because of the "i" in "bird"
Notice how, even though both "dog" synonyms contain the letter "i", only one was returned in the result (doesn't matter which one).
Because I need to return all matches from the term table regardless of whether or not there's a synonym and I need no more than 1 matching synonym, I'm using an OUTER APPLY as follows:
<!-- language: sql -->
SELECT
term.id,
term.term,
synonyms.synonym
FROM
term
OUTER APPLY (
SELECT
TOP 1
term.id,
synonym.synonym
FROM
synonym
WHERE
term.id = synonym.termId
AND synonym.synonym LIKE #keyword
) AS synonyms
WHERE
term.term LIKE #keyword
OR synonyms.synonym LIKE #keyword
There are indexes on term.term, synonym.termId and synonym.synonym. #Keyword is always something like '%foo%'. The problem is that, with close to 50,000 terms (not that much for databases, I know, but...), the performance is horrible. Any thoughts on how this can be done more efficiently?
Just a note, one thing I had thought to try was flattening the synonyms into a comma-delimited list in the term table so that I could get around the OUTER APPLY. Unfortunately though, that list can easily exceed 900 characters which would then prevent SQL Server from adding an index to that column. So that's a no-go.
Thanks very much in advance.
You've got a lot of unnecessary logic in there. There's no telling how SQL server is creating an execution path. It's simpler and more efficient to split this up into two separate db calls and then merge them in your code:
Get matches based on synonyms:
SELECT
term.id
,term.term
,synonyms.synonym
FROM
term
INNER JOIN synonyms ON term.termId = synonyms.termId
WHERE
synonyms.synonym LIKE #keyword
Get matches based on terms:
SELECT
term.id
,term.term
FROM
term
WHERE
term.term LIKE #keyword
For "flattening the synonyms into a comma-delimited list in the term table: - Have you considered using Full Text Search feature? It would be much faster even when your data goes on becoming bulky.
You can put all synonyms (as comma delimited) in "synonym" column and put full text index on the same.
If you want to get results also with the synonyms of the words, I recommend you to use Freetext. This is an example:
SELECT Title, Text, * FROM [dbo].[Post] where freetext(Title, 'phone')
The previous query will match the words with ‘phone’ by it’s meaning, not the exact word. It will also compare the inflectional forms of the words. In this case it will return any title that has ‘mobile’, ‘telephone’, ‘smartphone’, etc.
Take a look at this article about SQL Server Full Text Search, hope it helps

django complex datamodel

I am creating a small Django project which show stats collected from twitter data
for example my tables are
hashDetails
---------------------------------------------
id hashname tweetPosted trendDate userid
---------------------------------------------
1 #abc 44 2-2-2016 #xyz
2 #abc 55 2-2-2016 #qwer
3 #xcs 55 3-2-2016 #qwer
4 #xcs 55 4-2-2016 #qwer
---------------------------------------------
userDetails
----------------------------------------------
id userid profileImage profileImage
----------------------------------------------
1 #xyz image2.jpg www.abc.com
2 #qwer image3.jpg www.xadf.com
----------------------------------------------
for this if i create models.py
class userDetails(models.Model):
userid= models.CharField(max_length=30)
profileImage= models.CharField(max_length=30)
profileImage= models.CharField(max_length=30)
class hashDetails(models.Model):
hashname= models.CharField(max_length=30)
tweetPosted= models.IntegerField()
trendDate= models.DateTimeField()
userid = models.ForeignKey(userDetails, to_field ='userid')
but i don't wanna make userid unique cause
i want something like i can enter data in both table manually
and when i query in my view it will search result from both table
example
if i want all trends by #xyz
or if i want list of all users who did #abc trend
or if i want result of all trends in specific date
in short i want both table to behave like one
I can't use userid as unique my daily data will be about 20MB so you can assume its difficult to find ids
I found one solution of my problem and its working for me
i just create normal 2 model without foreignkey or any relation
and define my function in views.py
and got my result what i want
def teamdetail(request,test_id):
hashd = hashDetails.objects.get(hashname=test_id)
userd= userDetails.objects.all()
context = {'hashinfo': hashd, 'username':userd}
return render(request,'test/hashDetails.html',context)

How to make SQL query that will combine rows of result from one table with rows of another table in specific conditions in SQLite

I have aSQLite3 database with three tables. Sample data looks like this:
Original
id aName code
------------------
1 dog DG
2 cat CT
3 bat BT
4 badger BDGR
... ... ...
Translated
id orgID isTranslated langID aName
----------------------------------------------
1 2 1 3 katze
2 1 1 3 hund
3 3 0 3 (NULL)
4 4 1 3 dachs
... ... ... ... ...
Lang
id Langcode
-----------
1 FR
2 CZ
3 DE
4 RU
... ...
I want to select all data from Original and Translated in way that result would consist of all data in Original table, but aName of rows that got translation would be replaced with aName from Translated table, so then I could apply an ORDER BY clause and sort data in the desired way.
All data and table designs are examples just to show the problem. The schema does contain some elements like an isTranslated column or translation and original names in separate tables. These elements are required by application destination/design.
To be more specific this is an example rowset I would like to produce. It's all the data from table Original modified by data from Translated if translation is available for that certain id from Original.
Desired Result
id aName code isTranslated
---------------------------------
1 hund DG 1
2 katze CT 1
3 bat BT 0
4 dachs BDGR 1
... ... ... ...
This is a typcial application for the CASE expression:
SELECT Original.id,
CASE isTranslated
WHEN 1 THEN Translated.aName
ELSE Original.aName
END AS aName,
code,
isTranslated
FROM Original
JOIN Translated ON Original.id = Translated.orgID
WHERE Translated.langID = (SELECT id FROM Lang WHERE Langcode = 'DE')
If not all records in Original have a corresponding record in Translated, use LEFT JOIN instead.
If untranslated names are guaranteed to be NULL, you can just use IFNULL(Translated.aName, Original.aName) instead.
You should probably list the actual results you want, which would help people help you in the future.
In the current case, I'm guessing you want something along these lines:
SELECT Original.id, Original.code, Translated.aName
FROM Original
JOIN Lang
ON Lang.langCode = 'DE'
JOIN Translated
ON Translated.orgId = Original.id
AND Translated.langId = Lang.id
AND Translated.aName IS NOT NULL;
(Check out my example to see if these are the results you want).
In any case, the table set you've got is heading towards a fairly standard 'translation table' setup. However, there are some basic changes I'd make.
Original
Name the table to something specific, like Animal
Don't include a 'default' translation in the table (you can use a view, if necessary).
'code' is fine, although in the case of animals, genus/species probably ought to be used
Lang
'Lanugage' is often a reserved word in RDBMSs, so the name is fine.
Specifically name which 'language code' you're using (and don't abbreviate column names). There's actually (up to) three different ISO codes possible - just grab them all.
(Also, remember that languages have language-specific names, so language also needs it's own 'translation' table)
Translated
Name the table entity-specific, like AnimalNameTranslated, or somesuch.
isTranslated is unnecessary - you can derive it from the existence of the row - don't add a row if the term isn't translated yet.
Put all 'translations' into the table, including the 'default' one. This means all your terms are in one place, so you don't have to go looking elsewhere.

Django: ManyToManyField with additional Column

I am trying to create a job application-form with Django.
Basically, I created two models.
softwareskill_model
application_model
The admin can log into the admin-section and add new softwareskill-
entries to the database. The application_model references those
softwareskill-entries/records using a ManyToMany-Field:
class softwareskill_model(django.db.models.Model):
name = django.db.models.CharField(max_length=200)
class application_model(django.db.models.Model):
# ...
softwareskills = django.db.models.ManyToManyField(softwareskill_model)
So if someone wants to apply for the job, he can select which
software-packages he uses.
Now I want the applicant to make a rating from 1-6 for each software-skill
he has selected. How do you do that?
I am using a SQLite3 database and discovered that the ManyToManyField
creates a new table to store the relationship. In my case it looks like
this:
| ID | application_model_id | softwareskill_model_id |
My assumption would be to simply add a new column so it looks like this:
| ID | application_model_id | softwareskill_model_id | Rating |
Is that possible / the best way to do it? How?
I am very new to Django, databases and web-development in general and hope
you can help me :-)!
Thank you,
Henry
through is what you need to use, e.g.
class softwareskill_model(django.db.models.Model):
name = django.db.models.CharField(max_length=200)
class application_model(django.db.models.Model):
# ...
softwareskills = django.db.models.ManyToManyField(softwareskill_model, through="ApplicationSoftwareSkill")
class ApplicationSoftwareSkill(models.Model):
softwareskill = models.ForeignKey(softwareskill_model)
application = models.ForeignKey(application_model)
# extra fields here e.g.
rating = models.IntegerField()