Table field naming convention and SQL statements - sql

I have a practical question regarding naming table fields in a database. For example, I have two tables:
student (id int; name varchar(30))
teacher (id int, s_id int; name varchar(30))
There are both 'id' and "name" in two tables. In SQL statement, it will be ambiguous for the two if no table names are prefixed. Two options:
use Table name as prefix of a field in SQL 'where' clause
use prefixed field names in tables so that no prefix will be used in 'where' clause.
Which one is better?

Without a doubt, go with option 1. This is valid sql in any type of database and considered the proper and most readable format. It's good habit to prefix the table name to a column, and very necessary when doing a join. The only exception to this I've most often seen is prefixing the id column with the table name, but I still wouldn't do that.
If you go with option 2, seasoned DBA's will probably point and laugh at you.
For further proof, see #2 here: https://www.periscopedata.com/blog/better-sql-schema.html
And here. Rule 1b - http://www.isbe.net/ILDS/pdf/SQL_server_standards.pdf
As TT mentions, you'll make your life much easier if you learn how to use an alias for the table name. It's as simple as using SomeTableNameThatsWayTooLong as long_table in your query, such as:
SELECT LT.Id FROM SomeTableNameThatsWayTooLong AS LT

For queries that aren't ad-hoc, you should always prefix every field with either the table name or table alias, even if the field name isn't ambiguous. This prevents the query from breaking later if someone adds a new column to one of the tables that introduces ambiguity.
So that would make "id" and "name" unambiguous. But I still recommend naming the primary key with something more specific than "id". In your example, I would use student_id and teacher_id. This helps prevent mistakes in joins. You will need more specific names anyway when you run into tables with more than one unique key, or multi-part keys.
It's worth thinking these things through, but in the end consistency may be the more important factor. I can deal with tables built around id instead of student_id, but I'm currently working with an inconsistent schema that uses all of the following: id, sid, systemid and specific names like taskid. That's the worst of both worlds.

I would use aliases rather than table names.
You can assign an alias to a table in a query, one that is shorter than the table name. That makes the query a lot more readable. Example:
SELECT
t.name AS teacher_name,
s.name AS student_name
FROM
teacher AS t
INNER JOIN student AS s ON
s.id=t.s_id;
You can of course use the table name if you don't use aliases, and that would be preferred over your option 2.

If it doesn't get too long, I prefer prefixing in the table themselves, e.g. teacher.teacher_id, student.student_name. That way, you are always sure which name or id your are talking about, even if you for get to prefix the table name.

Related

Database Technology - postgresql

I am quite new to postgresql. Could any expert help me solve this problem please.
Consider the following PostgreSQL tables created for a university system recording which students take which modules:
CREATE TABLE module (id bigserial, name text);
CREATE TABLE student (id bigserial, name text);
CREATE TABLE takes (student_id bigint, module bigint);
Rewrite the SQL to include sensible primary keys.
CREATE TABLE module
(
m_id bigserial,
name text,
CONSTRAINT m_key PRIMARY KEY (m_id)
);
CREATE TABLE student
(
s_id bigserial,
name text
CONSTRAINT s_key PRIMARY KEY (s_id)
);
CREATE TABLE takes
(
student_id bigint,
module bigint,
CONSTRAINT t_key PRIMARY KEY (student_id)
);
Given this schema I have the following questions:
Write an SQL query to count how many students are taking DATABASE.
SELECT COUNT(name)
FROM student
WHERE module = 'DATABASE' AND student_id=s_id
Write an SQL query to show the student IDs and names (but nothing else) of all student taking DATABASE
SELECT s_id, name
FROM Student, take
WHERE module = 'DATABASE' AND student_id = s_id
Write an SQL query to show the student IDs and names (but nothing else) of all students not taking DATABASE.
SELECT s_id, name
FROM Student, take
WHERE student_id = s_id AND module != 'DATABASE'
Above are my answers. Please correct me if I am wrong and please comment the reason. Thank you for your expertise.
This looks like homework so I'm not going to give a detailed answer. A few hints:
I found one case where you used ยด quotes instead of ' apostrophes. This suggests you're writing SQL in something like Microsoft Word, which does so-called "smart quotes. Don't do that. Use a sensible text editor. If you're on Windows, Notepad++ is a popular choice. (Fixed it when reformatting the question, but wanted to mention it.)
Don't use the legacy non-ANSI join syntax JOIN table1, table2, table3 WHERE .... It's horrible to read and it's much easier to make mistakes with. You should never have been taught it in the first place. Also, qualify your columns - take.module not just module. Always write ANSI joins, e.g. in your example above:
FROM Student, take
WHERE module = 'DATABASE' AND student_id = s_id
becomes
FROM student
INNER JOIN take
ON take.module = 'DATABASE'
AND take.student_id = student.s_id;
(if the table names are long you can use aliases like FROM student s then s.s_id)
Query 3 is totally wrong. Imagine if take has two rows for a student, one where the student is taking database and one where they're taking cooking. Your query will still return a result for them, even though they're taking database. (It'd also return the same student ID multiple times, which you don't want). Think about subqueries. You will need to query the student table, using a NOT EXISTS (SELECT .... FROM take ...) to filter out students who are not taking database. The rest you get to figure out on your own.
Also, your schemas don't actually enforce the constraint that a student may only take DATABASE once at a time. Either add that, or consider in your queries the possibility that a student might be registered for DATABASE twice.

SQL: Ambiguity on key fields

I don't understand why the interpreter cannot handle the following:
SELECT id
FROM a
INNER JOIN b ON a.id = b.id
This query wil result in an error: Ambiguous column name 'id'
Which makes sense because the column in defined in multiple tables in my query. However, I clearly stated to only return the rows where the id's of both table are the same. So it wouldn't matter what table the id is from.
So just out of curiosity: Is there a reason why the interpreter demands a table for the field?
(My example is from SQLServer, not sure if other interpreters CAN handle this?)
Let's be clear about a few things. First, it is always a good idea to include table aliases when referring to columns. This makes the SQL easier to understand.
Second, you are assuming that because of the = in the on condition, the two fields are the same. This is not true. The values are the same.
For instance, one field could be int and the other float (I do not recommend using float for join keys, but it is allowed). What is the type of id? SQL wants to assign a type to all columns, and it is not clear what type to assign.
More common examples abound. One id might be a primary key and defined NOT NULL. The other might be a foreign keys and quite nullable. What is the nullability of just id?
In other words, SQL is doing the right thing. This is not about whether SQL can recognize something obvious, which sometimes it does. This is about a column being genuinely ambiguous and the SQL compiler not knowing how to define the result in the SELECT clause.
How do you exepect the interpreter to know which column to use ?
Since it doesn't have a real brain (sadly..!), you need to explicitly specify the table where you want the id from.
In this example it could be :
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id
Even if the id values are the same, the column still has to come from one of the tables which the interpreter cannot choose for you ;-)
The SELECT id, should be SELECT a.id since id is in both tables it does not know "which one" you referring to.

When is it required to give a table name an alias in SQL?

I noticed when doing a query with multiple JOINs that my query didn't work unless I gave one of the table names an alias.
Here's a simple example to explain the point:
This doesn't work:
SELECT subject
from items
join purchases on items.folder_id=purchases.item_id
join purchases on items.date=purchases.purchase_date
group by folder_id
This does:
SELECT subject
from items
join purchases on items.folder_id=purchases.item_id
join purchases as p on items.date=p.purchase_date
group by folder_id
Can someone explain this?
You are using the same table Purchases twice in the query. You need to differentiate them by giving a different name.
You need to give an alias:
When the same table name is referenced multiple times
Imagine two people having the exact same John Doe. If you call John, both will respond to your call. You can't give the same name to two people and assume that they will know who you are calling. Similarly, when you give the same resultset named exactly the same, SQL cannot identify which one to take values from. You need to give different names to distinguish the result sets so SQL engine doesn't get confused.
Script 1: t1 and t2 are the alias names here
SELECT t1.col2
FROM table1 t1
INNER JOIN table1 t2
ON t1.col1 = t2.col1
When there is a derived table/sub query output
If a person doesn't have a name, you call them and since you can't call that person, they won't respond to you. Similarly, when you generate a derived table output or sub query output, it is something unknown to the SQL engine and it won't what to call. So, you need to give a name to the derived output so that SQL engine can appropriately deal with that derived output.
Script 2: t1 is the alias name here.
SELECT col1
FROM
(
SELECT col1
FROM table1
) t1
The only time it is REQUIRED to provide an alias is when you reference the table multiple times and when you have derived outputs (sub-queries acting as tables) (thanks for catching that out Siva). This is so that you can get rid of ambiguities between which table reference to use in the rest of your query.
To elaborate further, in your example:
SELECT subject
from items
join purchases on items.folder_id=purchases.item_id
join purchases on items.date=purchases.purchase_date
group by folder_id
My assumption is that you feel that each join and its corresponding on will use the correlating table, however you can use whichever table reference you want. So, what happens is that when you say on items.date=purchases.purchase_date, the SQL engine gets confused as to whether you mean the first purchases table, or the second one.
By adding the alias, you now get rid of the ambiguities by being more explicit. The SQL engine can now say with 100% certainty which version of purchases that you want to use. If it has to guess between two equal choices, then it will always throw an error asking for you to be more explicit.
It is required to give them a name when the same table is used twice in a query. In your case, the query wouldn't know what table to choose purchases.purchase_date from.
In this case it's simply that you've specified purchases twice and the SQL engine needs to be able to refer to each dataset in the join in a unique way, hence the alias is needed.
As a side point, do you really need to join into purchases twice? Would this not work:
SELECT
subject
from
items
join purchases
on items.folder_id=purchases.item_id
and items.date=purchases.purchase_date
group by folder_id
The alias are necessary to disambiguate the table from which to get a column.
So, if the column's name is unique in the list of all possible columns available in the tables in the from list, then you can use the coulmn name directly.
If the column's name is repeated in several of the tables available in the from list, then the DB server has no way to guess which is the right table to get the column.
In your sample query all the columns names are duplicated because you're getting "two instances" of the same table (purchases), so the server needs to know from which of the instance to take the column. SO you must specify it.
In fact, I'd recommend you to always use an alias, unless there's a single table. This way you'll avoid lots of problems, and make the query much more clear to understand.
You can't use the same table name in the same query UNLESS it is aliased as something else to prevent an ambiguous join condition. That's why its not allowed. I should note, it's also better to use always qualify table.field or alias.field so other developers behind you don't have to guess which columns are coming from which tables.
When writing a query, YOU know what you are working with, but how about the person behind you in development. If someone is not used to what columns come from what table, it can be ambiguous to follow, especially out here at S/O. By always qualifying by using the table reference and field, or alias reference and field, its much easier to follow.
select
SomeField,
AnotherField
from
OneOfMyTables
Join SecondTable
on SomeID = SecondID
compare that to
select
T1.SomeField,
T2.AnotherField
from
OneOfMyTables T1
JOIN SecondTable T2
on T1.SomeID = T2.SecondID
In these two scenarios, which would you prefer reading... Notice, I've simplified the query using shorter aliases "T1" and "T2", but they could be anything, even an acronym or abbreviated alias of the table names... "oomt" (one of my tables) and "st" (second table). Or, as something super long as has been in other posts...
Select * from ContractPurchaseOffice_AgencyLookupTable
vs
Select * from ContractPurchaseOffice_AgencyLookupTable AgencyLkup
If you had to keep qualifying joins, or field columns, which would you prefer looking at.
Hope this clarifies your question.

Can alias in SQL be used within the same table?

I have difficulty understanding alias. Can alias in SQL be used within the same table?
In a query, you can use multiple aliases for a single table:
SELECT alias1.Name, alias2.Name
FROM table as alias1
INNER JOIN table as alias2
ON alias1.ChildId = alias2.Id
In the code above I am aliasing table as alias1 and alias2. It is the same table, with 2 different aliases.
Not sure I understand your question completely...
A good read on aliases # http://www.w3schools.com/sql/sql_alias.asp
http://www.sqltutorial.org/sqlalias.aspx
There are 2 kinds of aliases, one for tables and one for columns. Aliaes are used as a way to make your sql code more readable. It can give meaningful names to column and table names that might be long and/or confusing.
check the w3schools breif description and examples for SQL Alias
You can give a table or a column another name by using an alias. This can be a good thing to do if you have very long or complex table names or column names.
Which alias as you referring to: 'table alias' or 'column alias'?
In the SQL-92 Standard, the vernacular 'table alias' is referred to as a correlation name. A correlation name much be unique within its scope. The actual wording is as follows:
An identifier that is a correlation
name is associated with a table
within a particular scope. The scope
of a correlation name is either a
select statement: single row,
subquery, or query specification.
Scopes may be nested. In different
scopes, the same correlation name
may be associated with different
tables or with the same table.
In the SQL-92 Standard, the vernacular 'column alias' is referred to (rather wordily) as an as clause that contains a column name. There is no general condition that the same column name shall not be specified more than once in column lists (but there are context-specific restrictions e.g. a view column list). In fact, SQL's allowance of duplicate column names is often cited as a fatal flaw as regards being turly relational.

Sql naming best practice

I'm not entirely sure if there's a standard in the industry or otherwise, so I'm asking here.
I'm naming a Users table, and I'm not entirely sure about how to name the members.
user_id is an obvious one, but I wonder if I should prefix all other fields with "user_" or not.
user_name
user_age
or just name and age, etc...
prefixes like that are pointless, unless you have something a little more arbitrary; like two addresses. Then you might use address_1, address_2, address_home, etc
Same with phone numbers.
But for something as static as age, gender, username, etc; I would just leave them like that.
Just to show you
If you WERE to prefix all of those fields, your queries might look like this
SELECT users.user_id FROM users WHERE users.user_name = "Jim"
When it could easily be
SELECT id FROM users WHERE username = "Jim"
I agree with the other answers that suggest against prefixing the attributes with your table names.
However, I support the idea of using matching names for the foreign keys and the primary key they reference1, and to do this you'd normally have to prefix the id attributes in the dependent table.
Something which is not very well known is that SQL supports a concise joining syntax using the USING keyword:
CREATE TABLE users (user_id int, first_name varchar(50), last_name varchar(50));
CREATE TABLE sales (sale_id int, purchase_date datetime, user_id int);
Then the following query:
SELECT s.*, u.last_name FROM sales s JOIN users u USING (user_id);
is equivalent to the more verbose and popular joining syntax:
SELECT s.*, u.last_name FROM sales s JOIN users u ON (u.user_id = s.user_id);
1 This is not always possible. A typical example is a user_id field in a users table, and reported_by and assigned_to fields in the referencing table that both reference the users table. Using a user_id field in such situations is both ambiguous, and not possible for one of the fields.
As other answers suggest, it is a personal preference - pick up certain naming schema and stick to it.
Some 10 years ago I worked with Oracle Designer and it uses naming schema that I like and use since then:
table names are plural - USERS
surrogate primary key is named as singular of table name plus '_id' - primary key for table USERS would be "USER_ID". This way you have consistent naming when you use "USER_ID" field as foreign key in some other table
column names don't have table name as prefix.
Optionally:
in databases with large number of tables (interpret "large" as you see fit), use 2-3
characters table prefixes so that you can logically divide tables in areas. For example: all tables that contain sales data (invoices, invoice items, articles) have prefix "INV_", all tables that contain human resources data have prefix "HR_". That way it is easier to find and sort tables that contain related data (this could also be done by placing tables in different schemes and setting appropriate access rights, but it gets complicated when you need to create more than one database on one server)
Again, pick naming schema you like and be consistent.
Just go with name and age, the table should provide the necessary context when you're wondering what kind of name you're working with.
Look at it as an entity and name the fields accordingly
I'd suggest a User table, with fields such as id, name, age, etc.
A group of records is a bunch of users, but the group of fields represents a user.
Thus, you end up referring to user.id, user.name, user.age (though you won't always include the table name, depending on the query).
For the table names, I usually use pluralized nouns (or noun phrases), like you.
For column names I'd not use the table name as prefix. The table itself specifies the context of the column.
table users (plural):
id
name
age
plain and simple.
It's personal preference. The best advice we can give you is consistency, legibility and ensuring the relationships are correctly named as well.
Use names that make sense and aren't abbreviated if possible, unless the storage mechanism you are using doesn't work well with them.
In relationships, I like to use Id on the primary key and [table_name]_Id on the foreign key. eg. Order.Id and OrderItem.OrderId
Id works well if using a surrogate key as a primary key.
Also your storage mechanism may or may not be case sensitive, so be sure to that into account.
Edit: Also, thre is some theory to suggest that table should be name after what a single record in that table should represent. So, table name "User" instead of "Users" - personally the plural makes more sense to me, just keep it consistent.
First of all, I would suggest using the singular noun, i.e. user instead of users, although this is more of a personal preference.
Second, there are some who prefer to always name the primary key column id, instead of user_id (i.e. table name + id), and similar with for example name instead of employee_name. I think this is a bad idea for the following reason:
-- when every table has an "id" (or "name") column, you get duplicate column names in the output:
select e.id, e.name, d.id, d.name
from employee e, department d
where e.department_id = d.id
-- to avoid this, you need to specify column aliases every time you query:
select e.id employee_id, e.name employee_name, d.id department_id, d.name department_name
from employee e, department d
where e.department_id = d.id
-- if the column name includes the table, there are no conflicts, and the join condition is very clear
select e.employee_id, e.employee_name, d.department_id, d.department_name
from employee e, department d
where e.department_id = d.department_id
I'm not saying you should include the table name in every column in the table, but do it for the key (id) column and other "generic" columns such as name, description, remarks, etc. that are likely to be included in queries.
I explicitly named my columns using a prefix that was related to the table
i.e. table = USERS, column name = user_id, user_name, user_address_street, etc.
before that, when i started using JOINS I had to alias the crap out of the column names to avoid conflict in the query results, and then when accessed from templates in a MVC View, if the query result field name didn't match the published db schema, the template designers would get all confused and have to ask for the SQL VIEW to determine the correct field name to use.
So it looks messy to use a prefix in a column name, but in practice it works better for us.
I'm not entirely sure if there's a standard in the industry
Yes: ISO 11179-5: Naming and identification principles, available here.
I think table and column names must be like that.
Table Name :
User --> Capitalize Each Word and not plural.
Column Names :
Id --> If i see "Id" I understand this is PK column.
GroupId --> I understanding there is an table which named Group and this column is relation column for Group table.
Name --> If there is a column which named "Name" in User table, this means name of user. It's enaughly clear.
Especially if you are using Entity Framework I suppose this more.
Note: Sorry for my bad English. If somebody will correct my bad English i will be happy.