Sql naming best practice - sql

I'm not entirely sure if there's a standard in the industry or otherwise, so I'm asking here.
I'm naming a Users table, and I'm not entirely sure about how to name the members.
user_id is an obvious one, but I wonder if I should prefix all other fields with "user_" or not.
user_name
user_age
or just name and age, etc...

prefixes like that are pointless, unless you have something a little more arbitrary; like two addresses. Then you might use address_1, address_2, address_home, etc
Same with phone numbers.
But for something as static as age, gender, username, etc; I would just leave them like that.
Just to show you
If you WERE to prefix all of those fields, your queries might look like this
SELECT users.user_id FROM users WHERE users.user_name = "Jim"
When it could easily be
SELECT id FROM users WHERE username = "Jim"

I agree with the other answers that suggest against prefixing the attributes with your table names.
However, I support the idea of using matching names for the foreign keys and the primary key they reference1, and to do this you'd normally have to prefix the id attributes in the dependent table.
Something which is not very well known is that SQL supports a concise joining syntax using the USING keyword:
CREATE TABLE users (user_id int, first_name varchar(50), last_name varchar(50));
CREATE TABLE sales (sale_id int, purchase_date datetime, user_id int);
Then the following query:
SELECT s.*, u.last_name FROM sales s JOIN users u USING (user_id);
is equivalent to the more verbose and popular joining syntax:
SELECT s.*, u.last_name FROM sales s JOIN users u ON (u.user_id = s.user_id);
1 This is not always possible. A typical example is a user_id field in a users table, and reported_by and assigned_to fields in the referencing table that both reference the users table. Using a user_id field in such situations is both ambiguous, and not possible for one of the fields.

As other answers suggest, it is a personal preference - pick up certain naming schema and stick to it.
Some 10 years ago I worked with Oracle Designer and it uses naming schema that I like and use since then:
table names are plural - USERS
surrogate primary key is named as singular of table name plus '_id' - primary key for table USERS would be "USER_ID". This way you have consistent naming when you use "USER_ID" field as foreign key in some other table
column names don't have table name as prefix.
Optionally:
in databases with large number of tables (interpret "large" as you see fit), use 2-3
characters table prefixes so that you can logically divide tables in areas. For example: all tables that contain sales data (invoices, invoice items, articles) have prefix "INV_", all tables that contain human resources data have prefix "HR_". That way it is easier to find and sort tables that contain related data (this could also be done by placing tables in different schemes and setting appropriate access rights, but it gets complicated when you need to create more than one database on one server)
Again, pick naming schema you like and be consistent.

Just go with name and age, the table should provide the necessary context when you're wondering what kind of name you're working with.

Look at it as an entity and name the fields accordingly
I'd suggest a User table, with fields such as id, name, age, etc.
A group of records is a bunch of users, but the group of fields represents a user.
Thus, you end up referring to user.id, user.name, user.age (though you won't always include the table name, depending on the query).

For the table names, I usually use pluralized nouns (or noun phrases), like you.
For column names I'd not use the table name as prefix. The table itself specifies the context of the column.

table users (plural):
id
name
age
plain and simple.

It's personal preference. The best advice we can give you is consistency, legibility and ensuring the relationships are correctly named as well.
Use names that make sense and aren't abbreviated if possible, unless the storage mechanism you are using doesn't work well with them.
In relationships, I like to use Id on the primary key and [table_name]_Id on the foreign key. eg. Order.Id and OrderItem.OrderId
Id works well if using a surrogate key as a primary key.
Also your storage mechanism may or may not be case sensitive, so be sure to that into account.
Edit: Also, thre is some theory to suggest that table should be name after what a single record in that table should represent. So, table name "User" instead of "Users" - personally the plural makes more sense to me, just keep it consistent.

First of all, I would suggest using the singular noun, i.e. user instead of users, although this is more of a personal preference.
Second, there are some who prefer to always name the primary key column id, instead of user_id (i.e. table name + id), and similar with for example name instead of employee_name. I think this is a bad idea for the following reason:
-- when every table has an "id" (or "name") column, you get duplicate column names in the output:
select e.id, e.name, d.id, d.name
from employee e, department d
where e.department_id = d.id
-- to avoid this, you need to specify column aliases every time you query:
select e.id employee_id, e.name employee_name, d.id department_id, d.name department_name
from employee e, department d
where e.department_id = d.id
-- if the column name includes the table, there are no conflicts, and the join condition is very clear
select e.employee_id, e.employee_name, d.department_id, d.department_name
from employee e, department d
where e.department_id = d.department_id
I'm not saying you should include the table name in every column in the table, but do it for the key (id) column and other "generic" columns such as name, description, remarks, etc. that are likely to be included in queries.

I explicitly named my columns using a prefix that was related to the table
i.e. table = USERS, column name = user_id, user_name, user_address_street, etc.
before that, when i started using JOINS I had to alias the crap out of the column names to avoid conflict in the query results, and then when accessed from templates in a MVC View, if the query result field name didn't match the published db schema, the template designers would get all confused and have to ask for the SQL VIEW to determine the correct field name to use.
So it looks messy to use a prefix in a column name, but in practice it works better for us.

I'm not entirely sure if there's a standard in the industry
Yes: ISO 11179-5: Naming and identification principles, available here.

I think table and column names must be like that.
Table Name :
User --> Capitalize Each Word and not plural.
Column Names :
Id --> If i see "Id" I understand this is PK column.
GroupId --> I understanding there is an table which named Group and this column is relation column for Group table.
Name --> If there is a column which named "Name" in User table, this means name of user. It's enaughly clear.
Especially if you are using Entity Framework I suppose this more.
Note: Sorry for my bad English. If somebody will correct my bad English i will be happy.

Related

SQL database structure with two changing properties

Let's assume I am building the backend of a university management software.
I have a users table with the following columns:
id
name
birthday
last_english_grade
last_it_grade
profs table columns:
id
name
birthday
I'd like to have a third table with which I can determine all professors teaching a student.
So I'd like to assign multiple teachers to each student.
Those Professors may change any time.
New students may be added any time too.
What's the best way to achieve this?
The canonical way to do this would be to introduce a third junction table, which exists mainly to relate users to professors:
users_profs (
user_id,
prof_id,
PRIMARY KEY (user_id, prof_id)
)
The primary key of this junction table is the combination of a user and professor ID. Note that this table is fairly lean, and avoids the problem of repeating metadata for a given user or professor. Rather, user/professor information remains in your two original tables, and does not get repeated.

Sqlite design for cross linked tables and foreign keys usage

I don't know if this is the right place to ask. Because it is a question regarding sql database design I was thinking about database administrator but because the target of that site is database professionals (and I'm absolutely not a professional) I'll just post my question here. Please point me to the right place if you think there's a better place for this type of question.
Getting to the question.
I'm designing a database for translations of literary works. Because this involves people and people often don't fit in a "static" data model I have a pretty convoluted schema. Here is just a section of it, regarding people's names. Because foreign authors are involved (expecially Japanese) I have the added problem of transliteration for people names. At present the structure of the database for people and names is as follows
Let's take an example:
I have a person called "Kyokutei Bakin", which transliterates as 曲亭馬琴 in ideograms and キョクテイ バキン in japanese phonetic alphabet. This author is also known as "Takizawa Bakin" (滝沢馬琴, タキザワ バキン) and so on...
The 3 table structure with one to many relationships account for a person having multiple names (biographical_name, pen_name, ecc...) and for the fact that every name can have multiple phonetic readings.
This is all good. When I search for someone I just LEFT JOIN the tables and add OR conditions for the various fields. eg:
SELECT DISTINCT name.name_text, phonetic_name.name_text FROM name
LEFT JOIN phonetic_name ON (name.name_id=phonetic_name.name_id)
WHERE (name.name_text LIKE "%bak%")
OR (phonetic_name.name_text LIKE "%馬琴%");
My problem is that I want one of the names to be the main name of that person. The way I've done it is adding a "main_name" column in the "person" table that points to the "name_id" column of the "name" table. So that I can JOIN name ON (person.main_name=name.name_id) when I want just the main name.
My doubt is:
-Is it a good practice to cross-link two tables?
(Here "name" references "person" on person_id, but at the same "person" references "name" for main_name).
-Can this cause problems?
-How do I set foreign keys in this kind of situation?
-In case this is way too messy, how can I improve the design?
Additional info:
Being a design problem the sql implementation should not be so important, but just in case it does, I'm using sqlite3.
I would personally simplify the design like this:
Table: person
person_id (primary key)
...
Table: name
name_id (primary key)
name
name_type
parent_name_id (foreign key of itself)
person_id (foreign key of person table)
The table name has a recursive relationship where parent_name_id contains the name_id of the main name of the person. Note that for the main name name_id=parent_name_id. In the column name_type you can store the type of name (phonetic, ideogram, kanji, etc.). You can possibly normalize further the name_type into a dedicated table if you wish to have pure third normal form.
I would say the main benefit of this design is that it greatly simplifies your query when querying for names of any type. You can simply run something like this:
Select distinct b.person_id, b.name as main_name
From name a
Inner join name b on a.parent_name_id=b.name_id
Where a.name like ‘%...%’
In addition you can store as many names as you want for a single person.
If you want to return several names from different types you can do like this:
Select distinct b.person_id,
b.name as main_name,
c.name as kanji_name,
d.name as katakana_name
From name a
Inner join name b on a.parent_name_id=b.name_id
Left join name c on b.parent_name_id=c.parent_name_id and c.name_type=‘kanji’
Left join name d on b.parent_name_id=d.parent_name_id and d.name_type=‘katakana’
Etc...
Where a.name like ‘%...%’

Composite primary key: Finding one attribute using another

Data fields
I am designing a database table structure. Say that we need to record employee profiles from different companies. We have the following fields:
+---------+--------------+-----+--------+-----+
| Company | EmployeeName | Age | Gender | Tel |
+---------+--------------+-----+--------+-----+
It's possible that two employees from different company may have the same name (and assume that no 2 employee has the same name in the same company). In this case a composite primary key (Company, EmployeeName) would be necessary in my opinion.
Search
Now I need to get all information by using only one of the 2 attributes in the primary key. For example,
I want to search all employees' profile of Company A:
SELECT EmployeeName, Age, Gender, Tel FROM table WHERE Company = 'Company A'
And I can also search all employees from different company named Donald:
SELECT Company, Age, Gender, Tel FROM table WHERE EmployeeName = 'Donald'
Strategy
In order to implement this requirement, my strategy would be storing all data in a single table, which is easy to read and understandable. However I noticed that it may take a long time to search as the query may need to iterate through all rows. I would like to retrieve these information as quick as possible. Would there be a better strategy for this?
First, your rows should have a unique identifier for each row -- identity/auto-increment/serial, depending on the database. Second, you might reconsider names being unique. Why can't two people at the same company have the same name?
In any case, you have a primary key on, say, (company, name). For the opposite search you simply want another index on (name, company):
create index idx_profiles_name_company on profiles(name, company);
A note explaining Gordon's suggestion for an identity on each row. This is supplemental to his answer above.
In theory there is nothing wrong with a primary key that crosses columns and in a db like PostgreSQL I like to have identity values as secondary keys (i.e. not null unique) and specify natural primary keys. Of course on MS SQL Server or MySQL/InnoDB that would be a recipe for problems. I would also not say "all" but rather "almost all" since there are times when breaking this rule is good.
Regardless, having an identity row simplifies a couple of things and it provides an abstraction around keys in case you get things wrong. Composite keys provide a couple issues that end up eating time (and possibly resulting in downtime) later. These include:
Joins on composite keys are often more expensive than those on simple values, and
Adding or changing a natural primary key which crosses columns is far harder when joins are involved
So depending on your db you should either specify a unique secondary key or make your natural primary key separate (which you should do depends on storage and implementation specifics).

Table field naming convention and SQL statements

I have a practical question regarding naming table fields in a database. For example, I have two tables:
student (id int; name varchar(30))
teacher (id int, s_id int; name varchar(30))
There are both 'id' and "name" in two tables. In SQL statement, it will be ambiguous for the two if no table names are prefixed. Two options:
use Table name as prefix of a field in SQL 'where' clause
use prefixed field names in tables so that no prefix will be used in 'where' clause.
Which one is better?
Without a doubt, go with option 1. This is valid sql in any type of database and considered the proper and most readable format. It's good habit to prefix the table name to a column, and very necessary when doing a join. The only exception to this I've most often seen is prefixing the id column with the table name, but I still wouldn't do that.
If you go with option 2, seasoned DBA's will probably point and laugh at you.
For further proof, see #2 here: https://www.periscopedata.com/blog/better-sql-schema.html
And here. Rule 1b - http://www.isbe.net/ILDS/pdf/SQL_server_standards.pdf
As TT mentions, you'll make your life much easier if you learn how to use an alias for the table name. It's as simple as using SomeTableNameThatsWayTooLong as long_table in your query, such as:
SELECT LT.Id FROM SomeTableNameThatsWayTooLong AS LT
For queries that aren't ad-hoc, you should always prefix every field with either the table name or table alias, even if the field name isn't ambiguous. This prevents the query from breaking later if someone adds a new column to one of the tables that introduces ambiguity.
So that would make "id" and "name" unambiguous. But I still recommend naming the primary key with something more specific than "id". In your example, I would use student_id and teacher_id. This helps prevent mistakes in joins. You will need more specific names anyway when you run into tables with more than one unique key, or multi-part keys.
It's worth thinking these things through, but in the end consistency may be the more important factor. I can deal with tables built around id instead of student_id, but I'm currently working with an inconsistent schema that uses all of the following: id, sid, systemid and specific names like taskid. That's the worst of both worlds.
I would use aliases rather than table names.
You can assign an alias to a table in a query, one that is shorter than the table name. That makes the query a lot more readable. Example:
SELECT
t.name AS teacher_name,
s.name AS student_name
FROM
teacher AS t
INNER JOIN student AS s ON
s.id=t.s_id;
You can of course use the table name if you don't use aliases, and that would be preferred over your option 2.
If it doesn't get too long, I prefer prefixing in the table themselves, e.g. teacher.teacher_id, student.student_name. That way, you are always sure which name or id your are talking about, even if you for get to prefix the table name.

Custom Fields DB Design for Membership Application

I need to design database for a membership application which needs custom classification for multiple organizations
Following are the data set:
Organization type 1:
Name, Email, Joining Year, End Year, Role, Location.
Organization type 2:
Name, Email, Joining Year, End Year, Role, Department, Sub Organization, Location
Organization type 3:
Name, Email, Joining Year, End Year, Role, Identification No.
what would be the best way to design database for it?
few field items are common, few are specific to org, org types are limited
Option 1:
members_table - member_id, name, email, joining_year, end_year, role
members_org_type_1 - member_id, location
members_org_type_2 - member_id, department, sub_org, location
members_org_type_3 - member_id, id_no
Option: 2
members_table - member_id, name, email, joining_year, end_year, role
member_fields - member_id, field_type, field_value
field_labels - field_type, field_label
second type looks promising, but do not know how to do join operations members_table & member_fields with required fields?
This is a common problem in database design and there are 3 most common ways to deal with (4 if we count EAV):
Three separate tables, one for each type.
One table - with a lot of columns - where some of them will be allowed to have Nulls. The integrity cannot be easily dealt by the database (which column combinations will be Null and which not) and are usually dealt by the application. This is #noa's answer and it results in slightly less code and probably easier to come up with a working (although not perfectly) application.
One Member table (this is the supertype) and 3 additional tables, one for each subtype. This allows you to have no Nulls and to enforce which columns will be used, depending on the organization type. (this is your Option 1)
You can also add an org_type column in all tables. This will mean an
additional UNIQUE constraint on Member (org_type, member_ID) and the FOREIGN KEY constraint (from each subtype table) altered to include this org_type column. Something like this:
CREATE TABLE Member
( MemberID
, Org_Type
, Name
, ...
, Role
, PRIMARY KEY (MemberID)
, UNIQUE KEY (Org_Type, MemberID)
, CHECK Org_Type IN (1, 2, 3)
) ;
CREATE TABLE Member_Type_1
( MemberID
, Org_Type
, Location
, PRIMARY KEY (MemberID)
, FOREIGN KEY (Org_Type, MemberID)
REFERENCES Member(Org_Type, MemberID)
, CHECK Org_Type = 1
) ;
and finally there's (your option 2) EAV:
Entity-Attribute-Value model is, according to Wikipedia:
a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. In mathematics, this model is known as a sparse matrix. EAV is also known as object–attribute–value model, vertical database model and open schema.
There are various reasons not to use EAV in relational databases, mainly because of problems regarding datatype and referential integrity (that cannot be easily enforced), difficulty in writing even simple queries (that end up written with a lot of joins) and efficiency. See the answer by Simon Righarts at DBA.SE question: Is there a name for this database structure?
There are reasons that it's a valid option in certain cases though, as the article by Aaron Bertrand explains: What is so bad about EAV, anyway?, especially when you have a lot of columns and even more when you don't know in advance what columns you will need (custom made by customers). That may be your case, if you want the organizations to be able to add custom columns.
Note however, that it's not easy to costruct an efficient EAV model/application. You are actually building an RDBMS inside a database.
If org types are limited and rarely changing, just use one table:
members_table - member_id, name, email, joining_year, end_year, role, location, department, sub_org, id_no
Use null values in the fields which aren't relevant to the organization type, and hide the non-applicable fields when you present the information.
I gave a similar answer here, though it was for a different database.