Sqlite design for cross linked tables and foreign keys usage - sql

I don't know if this is the right place to ask. Because it is a question regarding sql database design I was thinking about database administrator but because the target of that site is database professionals (and I'm absolutely not a professional) I'll just post my question here. Please point me to the right place if you think there's a better place for this type of question.
Getting to the question.
I'm designing a database for translations of literary works. Because this involves people and people often don't fit in a "static" data model I have a pretty convoluted schema. Here is just a section of it, regarding people's names. Because foreign authors are involved (expecially Japanese) I have the added problem of transliteration for people names. At present the structure of the database for people and names is as follows
Let's take an example:
I have a person called "Kyokutei Bakin", which transliterates as 曲亭馬琴 in ideograms and キョクテイ バキン in japanese phonetic alphabet. This author is also known as "Takizawa Bakin" (滝沢馬琴, タキザワ バキン) and so on...
The 3 table structure with one to many relationships account for a person having multiple names (biographical_name, pen_name, ecc...) and for the fact that every name can have multiple phonetic readings.
This is all good. When I search for someone I just LEFT JOIN the tables and add OR conditions for the various fields. eg:
SELECT DISTINCT name.name_text, phonetic_name.name_text FROM name
LEFT JOIN phonetic_name ON (name.name_id=phonetic_name.name_id)
WHERE (name.name_text LIKE "%bak%")
OR (phonetic_name.name_text LIKE "%馬琴%");
My problem is that I want one of the names to be the main name of that person. The way I've done it is adding a "main_name" column in the "person" table that points to the "name_id" column of the "name" table. So that I can JOIN name ON (person.main_name=name.name_id) when I want just the main name.
My doubt is:
-Is it a good practice to cross-link two tables?
(Here "name" references "person" on person_id, but at the same "person" references "name" for main_name).
-Can this cause problems?
-How do I set foreign keys in this kind of situation?
-In case this is way too messy, how can I improve the design?
Additional info:
Being a design problem the sql implementation should not be so important, but just in case it does, I'm using sqlite3.

I would personally simplify the design like this:
Table: person
person_id (primary key)
...
Table: name
name_id (primary key)
name
name_type
parent_name_id (foreign key of itself)
person_id (foreign key of person table)
The table name has a recursive relationship where parent_name_id contains the name_id of the main name of the person. Note that for the main name name_id=parent_name_id. In the column name_type you can store the type of name (phonetic, ideogram, kanji, etc.). You can possibly normalize further the name_type into a dedicated table if you wish to have pure third normal form.
I would say the main benefit of this design is that it greatly simplifies your query when querying for names of any type. You can simply run something like this:
Select distinct b.person_id, b.name as main_name
From name a
Inner join name b on a.parent_name_id=b.name_id
Where a.name like ‘%...%’
In addition you can store as many names as you want for a single person.
If you want to return several names from different types you can do like this:
Select distinct b.person_id,
b.name as main_name,
c.name as kanji_name,
d.name as katakana_name
From name a
Inner join name b on a.parent_name_id=b.name_id
Left join name c on b.parent_name_id=c.parent_name_id and c.name_type=‘kanji’
Left join name d on b.parent_name_id=d.parent_name_id and d.name_type=‘katakana’
Etc...
Where a.name like ‘%...%’

Related

SQL database structure with two changing properties

Let's assume I am building the backend of a university management software.
I have a users table with the following columns:
id
name
birthday
last_english_grade
last_it_grade
profs table columns:
id
name
birthday
I'd like to have a third table with which I can determine all professors teaching a student.
So I'd like to assign multiple teachers to each student.
Those Professors may change any time.
New students may be added any time too.
What's the best way to achieve this?
The canonical way to do this would be to introduce a third junction table, which exists mainly to relate users to professors:
users_profs (
user_id,
prof_id,
PRIMARY KEY (user_id, prof_id)
)
The primary key of this junction table is the combination of a user and professor ID. Note that this table is fairly lean, and avoids the problem of repeating metadata for a given user or professor. Rather, user/professor information remains in your two original tables, and does not get repeated.

Extending table with another table ... sort of

I have a DB about renting cars.
I created a CarModels table (ModelID as PK).
I want to create a second table with the same primary key as CarModels have.
This table only contains the number of times this Model was searched on my website.
So lets say you visit my website, you can check a list that contains common cars rented.
"Most popular Cars" table.
It's not about One-to-One relationship, that's for sure.
Is there any SQL code to connect two Primary keys together ?
select m.ModelID, m.Field1, m.Field2,
t.TimesSearched
from CarModels m
left outer join Table2 t on m.ModelID = t.ModelID
but why not simply add the field TimesSearched to table CarModels ?
Then you dont need another table
Easiest is to just use a new primary key on the new table with a foreign key to the CarModels table, like [CarModelID] INT NOT NULL. You can put an index and a unique constraint on the FK.
If you reeeealy want them to be the same, you can jump through a bunch of hoops that will make your life Hell, like creating the table from the CarModels table, then setting that field as the primary key, then whenever you add a new CarModel you'll have to create a trigger that will SET IDENTITY_INSERT ON so you can add the new one, and remember to SET IDENTITY_INSERT OFF when you're done.
Personally, I'd create a CarsSearched table that holds ThisUser selected ThisCarModel on ThisDate: then you can start doing some fun data analysis like [are some cars more popular in certain zip codes or certain times of year?], or [this company rents three cars every year in March, so I'll send them a coupon in January].
You are not extending anything (modifying the actual model of the table). You simply need to make INNER JOIN of the table linking with the primary keys being equal.
It could be outer join as it has been suggested but if it's 1:1 like you said ( the second table with have exact same keys - I assume all of them), inner will be enough as both tables would have the same set of same prim keys.
As a bonus, it will also produce fewer rows if you didn't match all keys as a nice reminder if you fail to match all PKs.
That being said, do you have a strong reason why not to keep the said number in the same table? You are basically modeling 1:1 relationship for 1 extra column (and small one too, by data type)
You could extend (now this is extending tables model) with the additional attribute of integer that keeps that number for you.
Later is preferred for simplicity and lower query times.

How to write a SELECT statement when I don't know in advance what columns I want?

My application has several tables: a master OBJECT table, and several
tables for storing specific kinds of objects: CAT, SHOE and BOOK.
Here's an idea of what the table columns look like:
object
object_id (primary key)
object_type (string)
cat
cat_id (primary key)
object_id (foreign key)
name (string)
color (string)
shoe
shoe_id (primary key)
object_id (foreign key)
model (string)
size (string)
book
book_id (primary key)
object_id (foreign key)
title (string)
author (string)
From the user's point of view, each specific object is primarily identified
by its name, which is a different column for each table. For the CAT table
it's name, for the SHOE table it's model, and for the BOOK table it's
title.
Let's say I'm handed an object_id without knowing in advance what kind of
object it represents -- a cat, a shoe or a book. How do I write a
SELECT statement to get this information?
Obviously it would look a little like this:
SELECT object_type,name FROM object WHERE object_id = 12345;
But how do I get the right contents in the "name" column?
It seems like you're describing a scenario where the user's view on the data (objects have names, I don't care what type they are) is different from the model you're using to store the data.
If that is the case, and assuming you have some control over the database objects, I'd probably create a VIEW, allowing you to coalesce similar data for each type of object.
Example on SQL Server:
CREATE VIEW object_names AS
SELECT object_id, name FROM cat
UNION ALL
SELECT object_id, model AS name FROM shoe
UNION ALL
SELECT object_id, title AS name FROM book
GO
You can then SELECT name FROM object_names WHERE object_id = 12345, without concerning yourself with the underlying column names.
Your only real solutions basically boil down to the same thing: writing explicit statements for each specific table and unioning them into a single result set.
You can either do this in a view (giving you a dynamic database object that you can query) or as part of the query (whether it's straight SQL or a stored procedure). You don't mention which database you're using, but the basic query is something like this:
select object_id, name from cat where object_id = 12345 union all
select object_id, model from shoe where object_id = 12345 union all
select object_id, title from book where object_id = 12345
For SQL Server, the syntax for creating the view would be:
create view object_view as
select 'cat' as type, object_id, name from cat union all
select 'shoe', object_id, model from shoe union all
select 'book', object_id, title from book
And you could query like:
select type, name from object_view where object_id = 12345
However, what you have is a basic table inheritance pattern, but it's implemented improperly since:
The primary key of child tables (cat, shoe, book) should also be a foreign key to the parent table (object). You should not have a different key for this, unless two cat records can represent the same object (in which case this is not inheritance at all)
Common elements, such as a name, should be represented at the highest level of the hierarchy as appropriate (in this case in object, since all of the objects have the concept of a "name").
do a join
http://www.tizag.com/sqlTutorial/sqljoin.php
You can't. Why not name them the same thing, or pull that name back to the OBJECT table. You can very easily create a column called Name and put that in the OBEJCT table. This would still be normalized.
It seems like you have a few options. You could use a config file or a 'schema' table. You could rename your tables so that the name of ye column is always te same. You could have the class in your code know its table. You could make your architecture a little less generic, and allow the data access layer to understand the data it's accessing.
Which to choose? What problem are you solving? What problem were you solving, whose solution created this problem?
There's really no way to do this without first SELECTing to find out the kind, then SELECTing a second time to get the actual data. If you only have a few different kinds of objects, you could do it with a single SELECT and a bunch of LEFT JOINs to join all the tables at once, but that doesn't scale well if you've got lots of joiner tables.
But just thinking outside the box a bit, does the "identifier" that users see have to correspond exactly to the primary key in the table? Could you encode the "kind" of the object in the identifier itself? So for example, if object_id 12345 is a shoe you could "encode" this as "S12345" from the user's point of view. A book would be "B4567" and a cat "C2578". Then in your code, just separate out the first letter and use that to decide which table to join on, and the remaining numbers are your primary key.
If you cannot alter the original table due to dependencies, you could probably create a view of the table with uniform column name. More information on how to create views can be found here.
There a table you can look at that tells you the properties of all the tables (and column properties) in your db.
In postgres this is something like pg_stat_alltables, I think there is something similar in sql server. You could query this and work out what you required, then construct a query based on that info...
EDIT: Sorry re-reading the question, I don't think that is what you require. - I've solved a similar problem before by having a surrogate key table - one table with all the id's in and a type id, then a serial/identity column that contains the primary key for that table - this is the id you should use... then you can create a view which looks up the other information based on the type id in that table.
The 'entity ref' table would have columns 'entityref' (PK), 'id', 'type id' etc... (that is assuming you can't restructure to use inheritance)

Join performance

My situation is:
Table member
id
firstname
lastname
company
address data ( 5 fields )
contact data ( 2 fields )
etc
Table member_profile
member_id
html ( something like <h2>firstname lastname</h2><h3>Company</h3><span>date_registration</span> )
date_activity
chat_status
Table news
id
member_id (fk to member_id in member_profile)
title
...
The idea is that the full profile of the member, when viewed is fetched from the member database, in for instance a news overview, the smaller table which holds the basis display info for a member is joined.
However, i have found the need for more often use for the member info that is not stored in the member_profile table, e.g. firstname, lastname and gender, are nescesary when someone has posted a news item (firstname has posted news titled title.
What would be better to do? Move the fields from the member_profile table to the member table, or move the member fields to the member_profile table and perhaps remove them from the member table? Keep in mind that the member_profile table is joined a lot, and also updated on each login, status update etc.
You have two tables named member so i have the feeling your question isn't formed correctly.
What is the relationship between these tables? It looks like you have 3 tables, all one-to-one. So all you need to do is change (fk to member_id in member_profile) to (fk to id in member).
Now you can join in data from either of the 2 extra tables as you wish, without always having to go through member_profile.
[Edit] Also I assume that member_profile.member_id is a fk to member.id. If not, I believe it should :)
Combine them into one table so you're normalizing the name data then create 2 views which replicate the original two tables would be the easy option
Separating the tables between mostly-static fields and frequently-updated fields will improve write performance. So I would stay with what you're doing. If you cache the information from both tables together in a member object, read performance (and thus joining) is less of an issue.

Sql naming best practice

I'm not entirely sure if there's a standard in the industry or otherwise, so I'm asking here.
I'm naming a Users table, and I'm not entirely sure about how to name the members.
user_id is an obvious one, but I wonder if I should prefix all other fields with "user_" or not.
user_name
user_age
or just name and age, etc...
prefixes like that are pointless, unless you have something a little more arbitrary; like two addresses. Then you might use address_1, address_2, address_home, etc
Same with phone numbers.
But for something as static as age, gender, username, etc; I would just leave them like that.
Just to show you
If you WERE to prefix all of those fields, your queries might look like this
SELECT users.user_id FROM users WHERE users.user_name = "Jim"
When it could easily be
SELECT id FROM users WHERE username = "Jim"
I agree with the other answers that suggest against prefixing the attributes with your table names.
However, I support the idea of using matching names for the foreign keys and the primary key they reference1, and to do this you'd normally have to prefix the id attributes in the dependent table.
Something which is not very well known is that SQL supports a concise joining syntax using the USING keyword:
CREATE TABLE users (user_id int, first_name varchar(50), last_name varchar(50));
CREATE TABLE sales (sale_id int, purchase_date datetime, user_id int);
Then the following query:
SELECT s.*, u.last_name FROM sales s JOIN users u USING (user_id);
is equivalent to the more verbose and popular joining syntax:
SELECT s.*, u.last_name FROM sales s JOIN users u ON (u.user_id = s.user_id);
1 This is not always possible. A typical example is a user_id field in a users table, and reported_by and assigned_to fields in the referencing table that both reference the users table. Using a user_id field in such situations is both ambiguous, and not possible for one of the fields.
As other answers suggest, it is a personal preference - pick up certain naming schema and stick to it.
Some 10 years ago I worked with Oracle Designer and it uses naming schema that I like and use since then:
table names are plural - USERS
surrogate primary key is named as singular of table name plus '_id' - primary key for table USERS would be "USER_ID". This way you have consistent naming when you use "USER_ID" field as foreign key in some other table
column names don't have table name as prefix.
Optionally:
in databases with large number of tables (interpret "large" as you see fit), use 2-3
characters table prefixes so that you can logically divide tables in areas. For example: all tables that contain sales data (invoices, invoice items, articles) have prefix "INV_", all tables that contain human resources data have prefix "HR_". That way it is easier to find and sort tables that contain related data (this could also be done by placing tables in different schemes and setting appropriate access rights, but it gets complicated when you need to create more than one database on one server)
Again, pick naming schema you like and be consistent.
Just go with name and age, the table should provide the necessary context when you're wondering what kind of name you're working with.
Look at it as an entity and name the fields accordingly
I'd suggest a User table, with fields such as id, name, age, etc.
A group of records is a bunch of users, but the group of fields represents a user.
Thus, you end up referring to user.id, user.name, user.age (though you won't always include the table name, depending on the query).
For the table names, I usually use pluralized nouns (or noun phrases), like you.
For column names I'd not use the table name as prefix. The table itself specifies the context of the column.
table users (plural):
id
name
age
plain and simple.
It's personal preference. The best advice we can give you is consistency, legibility and ensuring the relationships are correctly named as well.
Use names that make sense and aren't abbreviated if possible, unless the storage mechanism you are using doesn't work well with them.
In relationships, I like to use Id on the primary key and [table_name]_Id on the foreign key. eg. Order.Id and OrderItem.OrderId
Id works well if using a surrogate key as a primary key.
Also your storage mechanism may or may not be case sensitive, so be sure to that into account.
Edit: Also, thre is some theory to suggest that table should be name after what a single record in that table should represent. So, table name "User" instead of "Users" - personally the plural makes more sense to me, just keep it consistent.
First of all, I would suggest using the singular noun, i.e. user instead of users, although this is more of a personal preference.
Second, there are some who prefer to always name the primary key column id, instead of user_id (i.e. table name + id), and similar with for example name instead of employee_name. I think this is a bad idea for the following reason:
-- when every table has an "id" (or "name") column, you get duplicate column names in the output:
select e.id, e.name, d.id, d.name
from employee e, department d
where e.department_id = d.id
-- to avoid this, you need to specify column aliases every time you query:
select e.id employee_id, e.name employee_name, d.id department_id, d.name department_name
from employee e, department d
where e.department_id = d.id
-- if the column name includes the table, there are no conflicts, and the join condition is very clear
select e.employee_id, e.employee_name, d.department_id, d.department_name
from employee e, department d
where e.department_id = d.department_id
I'm not saying you should include the table name in every column in the table, but do it for the key (id) column and other "generic" columns such as name, description, remarks, etc. that are likely to be included in queries.
I explicitly named my columns using a prefix that was related to the table
i.e. table = USERS, column name = user_id, user_name, user_address_street, etc.
before that, when i started using JOINS I had to alias the crap out of the column names to avoid conflict in the query results, and then when accessed from templates in a MVC View, if the query result field name didn't match the published db schema, the template designers would get all confused and have to ask for the SQL VIEW to determine the correct field name to use.
So it looks messy to use a prefix in a column name, but in practice it works better for us.
I'm not entirely sure if there's a standard in the industry
Yes: ISO 11179-5: Naming and identification principles, available here.
I think table and column names must be like that.
Table Name :
User --> Capitalize Each Word and not plural.
Column Names :
Id --> If i see "Id" I understand this is PK column.
GroupId --> I understanding there is an table which named Group and this column is relation column for Group table.
Name --> If there is a column which named "Name" in User table, this means name of user. It's enaughly clear.
Especially if you are using Entity Framework I suppose this more.
Note: Sorry for my bad English. If somebody will correct my bad English i will be happy.