For a school assignment, I have to create a database and run reports. I've created code and a classmate has also created code and it runs the same thing, but his is in a format I've not seen and don't quite understand.
Here is mine:
SELECT
Course.Name AS 'Course Name',
Program.Name AS 'Program Name'
FROM
Course, Program, ProgramCourse
WHERE
ProgramCourse.CourseID = Course.ID
AND
ProgramCourse.ProgramID = Program.ID
GO
And here's his:
CREATE VIEW NumberOfCoursePerProgram AS
SELECT
p.name AS ProgramName,
c.name AS CourseName
FROM
Program p
JOIN
ProgramCourse pc ON pc.ProgramID = p.ID
JOIN
Course c ON c.ID = pc.CourseID
GO
I ran both queries using the data in the tables I've created. They return practically the same results, just in a slightly different order but it fulfills the assignment question. Anyway, if I delete the p from Program p from his code, it returns an error
The multi-part identifier "p.name" could not be bound.
So how is SQL Server able to accept p.name and p.ID, etc. when I haven't ever established these variables? I don't quite understand how the code is working on his. Mine seems simple and straightforward, and I definitely understand what's going on there. So can someone explain his?
Thanks
There's a few differences. First off, he's creating a VIEW rather than just a select statement:
CREATE VIEW NumberOfCoursePerProgram AS
Once the view is created, you can query the view just as you would a table:
SELECT * FROM NumberOfCoursePerProgram;
Second, he's using an ANSI JOIN rather than an implicit JOIN. His method is more modern and most likely considered more correct by today's standards:
JOIN ProgramCourse pc ON pc.ProgramID = p.ID
JOIN Course c ON c.ID= pc.CourseID
Rather than:
FROM Course, Program, ProgramCourse
Also, note he's assigning table aliases when he refers to a table:
FROM Program p
The p at the end allows you to substitute p rather than specify the entire table name of Program elsewhere in the query. For example, you can now say WHERE p.Foo > 5 rather than WHERE Program.Foo > 5. In this case, it's just a shortcut and saves a few characters. However, suppose you were referring to the same table twice (for example, JOINing in two different rows on the same table). In that case, you might have to provide aliases for each table to disambiguate which one is which.
These are called alias in SQL. Alias is basically created to give more readability and for better ease of writing code.
The readability of a SELECT statement can be improved by giving a
table an alias, also known as a correlation name or range variable. A
table alias can be assigned either with or without the AS keyword:
table_name AS table alias
table_name table_alias
So in your query p is an alias to Program so that means now you can refer your table Program by the name of p instead of writing the whole name Program everywhere.
Similarly you can access the names of the columns of your table Program by simply writing p with a dot and then the column name. Something like p.column. This technique is very useful when you using JOINS and some your tables have similar names of the columns.
EDIT:-
Although most of the points are covered in other's answer. I am just adding a point that you should avoid the habit of JOINING table the way you are doing it right now.
You may check Bad habits to kick : using old-style JOINs by Aaron Bertrand for reference.
CREATE VIEW NumberOfCoursePerProgram AS
BEGIN
SELECT
p.name AS ProgramName,
c.name AS CourseName
FROM
Program p
JOIN
ProgramCourse pc ON pc.ProgramID= p.ID
JOIN Course c ON c.ID= pc.CourseID
END
GO
Observe that both tables Program and Course have a table alias defined.
The select part must specify the table from which the column name comes from. Which is exactly what you did. Your partner just added aliases to the tables names. These aliases are shorter, and makes the query look a bit less like a big wall of text.
The other difference is the use of joints. The joins are usually used to link results from two tables that has a corresponding column.
The columns are usually the primary key, and the foreign key in the second table.
Your query is fine, but the join syntax is preferred.
Edit : Once the view is created, it is now compiled and can be used in a select just like a table.
SELECT * FROM NumberOfCoursePerProgram
If you need to modify the view, you can use
ALTER VIEW NumberOfCoursePerProgram AS
......
......
There are a few things going on here:
if I delete the "p" from "Program p" from his code, it returns "The multi-part identifier "p.name" could not be bound."
The P is an alias for program. By doing it that way. If you ever need to use Program you can just call it P.
As far as what he is doing: it's better to do a JOIN in this case as opposed to doing a multi-table SELECT that you are doing. In simple cases it doesn't really matter but what if course and program both had millions of rows. In your way you are basically selecting everything. By doing the join on ProgramID you are only getting those items in ProgramCourse that correspond to an entry in Program as well (tied together by ID and CourseID).
One more important note. You are doing a simple SELECT statement. In SQL there are objects called VIEWS that act as a virtual table. He can now do any time a SELECT * FROM NumberOfCoursePerProgram and he will never have to do any of the joins, and selects again.
Hope that helps...
In SQL Server you can give a table an alias without AS, its optional.
If you were to add the AS it would will work just like you have it.
It may also be worthy to note that he is using a join as opposed to a where clause join, which is my preference because it's easier to read and more update code ethic.
Sometimes you need to join on a WHERE clause because of multiple conditions on join, but that's pretty rare in your case.
Related
I have recently started to learn oracle, and I am having difficulty understanding this inner join on the tables.
INSERT INTO temp_bill_pay_ft
SELECT DISTINCT
ft.ft_id,
ft.ft_credit_acct_no,
ft.ft_debit_acct_no,
ft.ft_stmt_nos,
ft.ft_debit_their_ref,
ft.ft_date_time
FROM
funds_transfer_his ft
INNER JOIN temp_bill_pay_lwday_pl dt
ON ft.ft_id = dt.ac_ste_trans_reference || ';1'
AND ft.ft_credit_acct_no = dt.ac_id;
It is this line specifically which I dont understand, why do we use || here, I suppose it is for concatenation.
ON ft.ft_id = dt.ac_ste_trans_reference||';1'
Can somebody please explain to me this sql query. I would really appreciate it. Thank you.
This is string concatenation. The need is because there is a design error in the database and join keys are not the same in the two tables. So the data might look something like this:
ft_id ac_ste_trans_reference
123;1 123
abc;1 abc
In order for the join to work, the keys need to match. One possibility is to remove the last two characters from ft_id, but I'm guessing those are meaningful.
I can speculate on why this is so. One possibility is that ft_id is really a compound key combined into a single column -- and the 1 is used to indicate the "type" of key. If so, then there are possibly other values after this:
ft_id
123;1
garbled;2
special;3
The "2" and "3" would refer to other reference tables.
If this is the situation, then it would be cleaner to have a separate column with the correct ac_ste_trans_reference. However that occupies additional space, and can require multiple additional columns for each type. So hacks like the one you see are sometimes implemented.
Yes it is used for concatenation.
But only somebody having worked on this database model can explain what table data represent and why this concatenation is needed for this joining condition.
I'm not sure I can provide enough details for an answer, but my company is having a performance issue with an older mssql view. I've narrowed it down to the right outer joins, but I'm not familiar with the structure of joins following joins without a "ON" with each one, as in the code snippet below.
How do I write the joins below to either improve performance or to the simpler format of Join Tablename on Field1 = field2 format ?
FROM dbo.tblObject AS tblObject_2
JOIN dbo.tblProspectB2B PB ON PB.Object_ID = tblObject_2.Object_ID
RIGHT OUTER JOIN dbo.tblProspectB2B_CoordinatorStatus
RIGHT OUTER JOIN dbo.tblObject
INNER JOIN dbo.vwDomain_Hierarchy
INNER JOIN dbo.tblContactUser
INNER JOIN dbo.tblProcessingFile WITH ( NOLOCK )
LEFT OUTER JOIN dbo.enumRetentionRealization AS RR ON RR.RetentionRealizationID = dbo.tblProcessingFile.RetentionLeadTypeID
INNER JOIN dbo.tblLoan
INNER JOIN dbo.tblObject AS tblObject_1 WITH ( NOLOCK ) ON dbo.tblLoan.Object_ID = tblObject_1.Object_ID ON dbo.tblProcessingFile.Loan_ID = dbo.tblLoan.Object_ID ON dbo.tblContactUser.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.vwDomain_Hierarchy.Object_ID = tblObject_1.Domain_ID ON dbo.tblObject.Object_ID = dbo.tblLoan.ContactOwnerID ON dbo.tblProspectB2B_CoordinatorStatus.Object_ID = dbo.tblLoan.ReferralSourceContactID ON tblObject_2.Object_ID = dbo.tblLoan.ReferralSourceContactID
Your last INNER JOIN has a number of ON statements. Per this question and answer, such syntax is equivalent to a nested subquery.
That is one of the worst queries I have ever seen. Since I cannot figure out how it is supposed to work without the underlying data, this is what I suggest to you.
First find a good sample loan and write a query against this view to return where loan_id = ... Now you have a data set you chan check you changes against more easily than the, possibly, millions of records this returns. Make sure these results make sense (that right join to tbl_objects is bothering me as it makes no sense to return all the objects records)
Now start writing your query with what you think should be the first table (I would suggest that loan is the first table, if it not then the first table is Object left joined to loan)) and the where clause for the loan id.
Check your results, did you get the same loan information as teh view query with the where clause added?
Then add each join one at a time and see how it affects the query and whether the results appear to be going off track. Once you have figured out a query that gives the same results with all the tables added in, then you can try for several other loan ids to check. Once those have checked out, then run teh whole query with no where clause and check against the view results (if it is a large number you may need to just see if teh record counts match and visually check through (use order by on both things in order to make sure your results are in the same order). In the process try to use only left joins and not that combination of right and left joins (its ok to leave teh inner ones alone).
I make it a habit in complex queries to do all the inner joins first and then the left joins. I never use right joins in production code.
Now you are ready to performance tune.
I woudl guess the right join to objects is causing a problem in that it returns teh whole table and the nature of that table name and teh other joins to the same table leads me to believe that he probably wanted a left join. Without knowing the meaning of the data, it is hard to be sure. So first if you are returning too many records for one loan id, then consider if the real problem is that as tables have grown, returning too many records has become problematic.
Also consider that you can often take teh view and replace it with code to get the same results. Views calling views are a poor technique that often leads to performance issues. Often the views on top of the other views call teh same tables and thus you end up joining to them multiple times when you don;t need to.
Check your Explain plan or Execution plan depending on what database backend you have. Analysis of this should show where you might have missing indexes.
Also make sure that every table in the query is needed. This especially true when you join to a view. The view may join to 12 other tables but you only need the data from one of them and it can join to one of your tables. MAke sure that you are not using select * but only returning teh fields the view actually needs. You have inner joins so, by definition, select * is returning fields you don't need.
If your select part of teh view has a distinct in it, then consider if you can weed down the multiple records you get that made distinct needed by changing to a derived table or adding a where clause. To see what is causing the multiples, you may need to temporarily use select * to see all the columns and find out which one is not uniques and is causing the issue.
This whole process is not going to be easy or fun. Just take it slowly, work carefully and methodically and you will get there and have a query that is understandable and maintainable in the end.
I have seen a few queries where the alias of the derived table is also used in the query that makes up the derived table. Can anyone confirm if this is allowable or not?
Here is a sample query. Pay attention to how alias "st" is used twice:
SELECT ft.ThisColumn, st.OtherID
FROM FirstTable ft
INNER JOIN
(SELECT st.CommonID,st.OtherID,DateEntered,DateExited,row_number() OVER (PARTITION BY OtherID ORDER BY DateEntered DESC) stRank
FROM SecondTable st
WHERE (#StartDate BETWEEN DateEntered and DateExited)
) st
ON ft.CommonID=st.CommonID AND st.stRank=1
Is it OK to use the same alias "st" in these two different places?
The st inside the derived table is only accessible inside that query and only inside phases that will be executed after FROM clause, and that is OK as it is not accessible in outside context.
The second st is an alias for the whole derived table's results which will be used in the outer context and inside phases that will be executed after FROM clause and that is OK too.
As you know, first the FROM of outer query clause will be executed and that will cause the derived table to be executed and after that the result(which are relational) returned by the derived table will get st as alias and will be participated in your join query.
Additional Note: Please keep in mind that Sql Server databases has a close relation with mathematical relations and sets, and as you know all the sets in mathematical theories should have a valid name as we need to refer to them, so every relation in sql server(Table, View, Table Expression such as derived table or CTE and etc) should have a valid name too.
But I advice you not to use two aliases with same name in one query, even if their logical processing phase is different, because it will reduce the readability of your query.
In short, your query is correct and valid.
the question:
Find the title of the books whose keyword contains the last 3 characters of the bookgroup of the book which was booked by "Mr. Karim".
I am trying to do it like this:
SELECT Title
FROM Lib_Book
WHERE BookKeywords LIKE '%(SELECT BookGroup FROM Lib_Book WHERE BookId=(SELECT BookId from Lib_Booking, Lib_Borrower WHERE Lib_Booking.BId=Lib_Borrower.BId AND Lib_Borrower.BName = 'Mr. Karim'))%';
from the part after % upto the end returns me an answer which is 'programming'. so i need to indicate the BookKeyword as '%ing%'. How can i do that?
**the tables are huge so i hvnt written those here..if anyone need to those plz lemme know...thnx
You have the basic concept down, although it's not possible to process a SELECT statement inside a LIKE clause (and I'd probably shoot the RDBMS developer who allowed that - it's a GAPING HUGE HOLE for SQL Injection). Also, you're likely to have problems with multiple results, as your 'query' would fail the moment Mr. Karim had borrowed more than one book.
I'd probably start by attempting to make things run off of joins (oh, and never use implicit join syntax):
SELECT d.title
FROM Lib_Borrower as a
JOIN Lib_Booking as b
ON b.bId = a.bId
JOIN Lib_Book as c
ON c.bookId = b.bookId
JOIN Lib_Book as d
ON d.bookKeywords LIKE '%' + SUBSTRING(c.bookGroup, LENGTH(c.bookGroup) - 3) + '%'
WHERE a.bName = 'Mr. Karim'
Please note the following caveats:
This will get you all titles for all books with similar keywords to all borrowed books (from all 'Mr. Karim's). You may need to include some sort of restrictive criteria while joining to Lib_Booking.
The column bookKeywords seems potentially like a multi-value column (would need example data). If so, your table structure needs to be revised.
The use of SUBSTRING() or RIGHT() will invalidate the use of inidicies in joining to the bookGroup column. There isn't much you can necessarily do about that, given your requirements...
This table is not internationalization safe (because the bookGroup column is language-dependant parsed text). You may find yourself better served by creating a Book_Group table, a Keywords table, and a cross-reference Book_Keywords table, and joining on numerical ids. You may also want language-keyed Book_Group_Description and Keyword_Description tables. This will take more space, and probably take more processing time (increased number of joins, although potentially less textual processing), but give you increased flexibility and 'safety'.
Out of interest when working with SQL statements should I always use the fully qualifed column name (tablename.columnname) even if only working with one table e.g.
SELECT table.column1, table.column2 FROM table
It's better if you do - it doesn't add any complexity, and it can prevent errors in the future.
But in a well-defined system, you shouldn't have to - it's like namespaces in programming languages. The ideal is not to have conflicts, but it can clutter the code with the superfluous use of explicit names.
-Adam
I generally follow these rules:
When using a single table, it is not necessary to use the table name prefix:
SELECT col1, col2 FROM table1
For multiple tables, use the full table name. Aliases can be confusing, especially when doing multiple joins:
SELECT table1.col1, table2.col2 FROM table1 INNER JOIN table2 on
table1.id = table2.id
I see many developers using table aliases, but particularly in large projects with multiple developers, these can become cryptic. A few extra keystrokes can provide a lot more clarity in the code.
If may, however, become necessary to use a column alias when columns have the same name. In that case:
SELECT table1.col1, table2.col1 as table2_col1 FROM table1
INNER JOIN table2 on
table1.id = table2.id
I would put this as personal preference. It would only make a difference if you started joining tables which contain duplicate column names.
Also, rather than write out the table name in full, use an alias:
SELECT t.column1, t.column2 FROM table as t
If you are only selecting from one table I do not see the overall usefulness. If you are selecting from multiple tables, qualifying the column names would certainly make it easier to read for any other developer who may not be familiar with your database schema.
If you're only querying one table - I'd say no. It's more readable that way.
Don't solve problems you don't have yet. (At least that's what my team lead is always telling me.) I'm sure the monkey who someday has to add a JOIN to your statement can figure it out.
I think it is a good idea to always use the fully qualified column name. Use an alias if the table name is too long. It also prepares your queries for futures additions of e.g. joins.
I would say it is nice to use qualified name, it adds readability to your code. It does not make much sense to use it for single table but for multiple tables it is must. if table names are too big then it is recommended to use alias, alias should preferably be derived from table name.
SELECT Dep.Name,Emp.Name
FROM Department DEP INNER JOIN Employee Emp
ON Dep.departmentid=Emp.DepartmentID
No.
You should always alias the tables, and you should always qualify your column names with the table aliases.
select
p.FirstName,
p.LastName,
p.SSN
from Person p
where p.ID = 345
I prefer to use a table alias when more than 1 table is in play.
Even the most well defined systems are subject to change. A new field could quite easily be introduced to a table causing ambiguity in an existing query.
Fully qualifying them protects you from this scenario.