SQL Server: Using columns with identical names - sql

I'm writing a migration script to move data from one data model to another in Microsoft SQL Server Management Studio. The problem I'm running into is that, in the source database, some tables have foreign key columns that I need to compare. A snippet of code:
INSERT INTO TargetDB.dbo.Encounter(EncounterID, PATID, DRG)
Select
visit_occurrence_id,
person_id,
(Select
Case when ((Select top 1 observation_concept_id from SourceDB.dbo.Observation where visit_occurrence_id = visit_occurrence_id) = 3040464)
Then (Select top 1 value_as_string from SourceDB.dbo.Observation where visit_occurrence_id = visit_occurrence_id)
Else NULL End
)
from SourceDB.dbo.Visit_occurrence
As you can see, I need to compare visit_occurrence_id in SourceDB.dbo.Observation to visit_occurrence_id in SourceDB.dbo.Visit_occurrence. As it is, it's just returning values from the first row in SourceDB.dbo.Observation, since visit_occurrence_id will always equal itself.
What's the proper way to do this? Can I assign the first visit_occurrence_id value to a variable within the query, so it has a distinct name? I'm pretty lost here.

I'm going to add a little more detail for you here in an answer. You can always refer to an object by it's fully-qualified name, but it isn't always necessary:
Database.Schema.Table
or
Database.Schema.Table.Column
with sql server, it can even include server for linked-server scenarios.
also true of other objects like views, procedures, functions, etc... Aliasing of tables and/or columns can be a good strategy for shortening this qualification.
Anytime there is ambiguity, this is necessary. However, it is a good practice to be fairly explicit, because it can save you future headaches. As an example, consider this view:
CREATE VIEW vwEmployeesWithLocation AS
SELECT
E.EmployeeId -- from employees
, LastName -- from employees
, Status -- from employees
, LocationName -- from locations
FROM
Employees AS E
INNER JOIN
EmployeeLocations AS EL ON E.EmloyeeId = EL.EmployeeId
INNER JOIN
Locations AS L ON EL.LocationId = L.LocationId
Right now, everything is fine because other than EmployeeId, the column names are distinct. However, someone might add a Status column to the Locations table in the future and break this view. So, it would be better to explicitly include the table prefix for all columns in the select.
In your case, your query is cross database, so again, be explicit about the database in all parts of your query.

Used snow_FFFFFF's answer in the comments: Just used SourceDB.dbo.Observation.visit_occurence_id.

Related

Difference between DELETE and DELETE FROM in SQL?

Is there one? I am researching some stored procedures, and in one place I found the following line:
DELETE BI_Appointments
WHERE VisitType != (
SELECT TOP 1 CheckupType
FROM BI_Settings
WHERE DoctorName = #DoctorName)
Would that do the same thing as:
DELETE FROM BI_Appointments
WHERE VisitType != (
SELECT TOP 1 CheckupType
FROM BI_Settings
WHERE DoctorName = #DoctorName)
Or is it a syntax error, or something entirely different?
Assuming this is T-SQL or MS SQL Server, there is no difference and the statements are identical. The first FROM keyword is syntactically optional in a DELETE statement.
http://technet.microsoft.com/en-us/library/ms189835.aspx
The keyword is optional for two reasons.
First, the standard requires the FROM keyword in the clause, so it would have to be there for standards compliance.
Second, although the keyword is redundant, that's probably not why it's optional. I believe that it's because SQL Server allows you to specify a JOIN in the DELETE statement, and making the first FROM mandatory makes it awkward.
For example, here's a normal delete:
DELETE FROM Employee WHERE ID = #value
And that can be shortened to:
DELETE Employee WHERE ID = #value
And SQL Server allows you to delete based on another table with a JOIN:
DELETE Employee
FROM Employee
JOIN Site
ON Employee.SiteID = Site.ID
WHERE Site.Status = 'Closed'
If the first FROM keyword were not optional, the query above would need to look like this:
DELETE FROM Employee
FROM Employee
JOIN Site
ON Employee.SiteID = Site.ID
WHERE Site.Status = 'Closed'
This above query is perfectly valid and does execute, but it's a very awkward query to read. It's hard to tell that it's a single query. It looks like two got mashed together because of the "duplicate" FROM clauses.
Side note: Your example subqueries are potentially non-deterministic since there is no ORDER BY clause.
Hi friends there is no difference between delete and delete from in oracle database it is optional, but this is standard to write code like this
DELETE FROM table [ WHERE condition ]
this is sql-92 standard. always develop your code in the standard way.

Create new table from average of multiple columns in multiple tables

I have the following query:
CREATE TABLE Professor_Average
SELECT Instructor, SUM( + instreffective_avg + howmuchlearned_avg + instrrespect_avg)/5
FROM instreffective_average, howmuchlearned_average, instrrespect_average
GROUP BY Instructor;
It is telling me that Instructor is ambiguous. How do I fix this?
Qualify instructor with the name of the table it came from.
For example: instreffective_average.Instructor
If you don't do this, SQL will guess which table of the query it came from, but if there are 2 or more possibilities it doesn't try to guess and tells you it needs help deciding.
Your query most likely fails in more than one way.
In addition to what #Patashu told you about table-qualifying column names, you need to JOIN your tables properly. Since Instructor is ambiguous in your query I am guessing (for lack of information) it could look like this:
SELECT ie.Instructor
,SUM(ie.instreffective_avg + h.howmuchlearned_avg + ir.instrrespect_avg)/5
FROM instreffective_average ie
JOIN howmuchlearned_average h USING (Instructor)
JOIN instrrespect_average ir USING (Instructor)
GROUP BY Instructor
I added table aliases to make it easier to read.
This assumes that the three tables each have a column Instructor by which they can be joined. Without JOIN conditions you get a CROSS JOIN, meaning that every row of every table will be combined with every row of every other table. Very expensive nonsense in most cases.
USING (Instructor) is short syntax for ON ie.Instructor = h.Instructor. It also collapses the joined (necessarily identical) columns into one. Therefore, you would get away without table-qualifying Instructor in the SELECT list in my example. Not every RDBMS supports this standard-SQL feature, but you failed to provide more information.

SQL Table / Sub-Query Alias Conventions

I've been writing SQL for a number of years now on various DBMS (Oracle, SQL Server, MySQL, Access etc.) and one thing that has always struck me is the seemingly lack of naming convention when it comes to table & sub-query aliases.
I've always read that table alises are the way to go and although I haven't always used them, when I do I'm always stuck between what names to use. I've gone from using descriptive names to single characters such as 't', 's' or 'q' and back again. Take for example this MS Access query I've just written, I'm still not entirely happy with the aliases I'm using even with a relatively simple query as this, I still don't think it's all that easy to read:
SELECT stkTrans.StockName
, stkTrans.Sedol
, stkTrans.BookCode
, SUM(IIF(stkTrans.TransactionType="S", -1 * stkTrans.Units, 0)) AS [Sell Shares]
, SUM(IIF(stkTrans.TransactionType="B", stkTrans.Units, 0)) AS [Buy Shares]
, SUM(IIF(stkTrans.TransactionType="B", -1 * stkTrans.Price, 0) * stkTrans1.Min_Units) + SUM(IIF(stkTrans.TransactionType="S", stkTrans.Price, 0) * stkTrans1.Min_Units) AS [PnL]
, "" AS [Comment]
FROM tblStockTransactions AS stkTrans
INNER JOIN (SELECT sT1.BookCode
, sT1.Sedol
, MIN(sT1.Units) AS [Min_Units]
FROM tblStockTransactions sT1
GROUP BY sT1.BookCode, sT1.Sedol
HAVING (SUM(IIF(sT1.TransactionType="S", 1, 0)) > 0
AND SUM(IIF(sT1.TransactionType="B", 1, 0)) > 0)) AS stkTrans1 ON (stkTrans.BookCode = stkTrans1.BookCode) AND (stkTrans.Sedol = stkTrans1.Sedol)
GROUP BY stkTrans.BookCode, stkTrans.StockName, stkTrans.Sedol;
What do you think? Thought I would throw it out there to see what everyone else's feelings are about this.
I don't know of any canonical style rules for naming table/query aliases across databases, although I understand that Oracle recommends abbreviations of three to four characters.
I would generally steer clear of single letter abbreviations, except where the query is sufficiently simple that these should be completely unambiguous to anyone having to maintain the code - typically no more than two or three tables per query.
I would also generally avoid long alias names that conform to the general style of your database table-naming conventions, since it can become unclear what is a database table name and what is an alias.
In the example provided, the alias sT1 inside the inline view is utterly unnecessary, as there is only one table being accessed within that inline view. That leaves one table being joined to one inline view (based on the same table) in the query - in these circumstances, I would use s as the alias for the table, and s1 as the alias for the inline view (to indicate that it was querying the same underlying database table).
DT1 and DT2 seems to be good approach ..
I am in phase of understanding the existing Procedures and some
Procedures use this naming convention DT1,DT2..
It becomes fairly simple to understand ..rather than giving some table short name as alias
If I was using SQL Server I'd probably put the derived table in a Common Table Expression (CTE) with a logical name (to indicate what the table is) then shorten it with a correlation name ("table alias") in the main query (to aid readability) e.g.
WITH StockTransactions__type_S_or_B__smallest_units
AS
(
<derived table query here>
)
SELECT stkTrans.StockName
...
FROM tblStockTransactions AS stkTrans
INNER JOIN StockTransactions__type_S_or_B__smallest_units AS stkTrans1
ON (stkTrans.BookCode = stkTrans1.BookCode) AND (stkTrans.Sedol = stkTrans1.Sedol)
GROUP BY stkTrans.BookCode, stkTrans.StockName, stkTrans.Sedol;
Obviously, this isn't an option in Access so you go straight to the correlation name and lose the full name entirely. This is not ideal but acceptable, IMO.
SQL requires a name to be assigned to a derived table for no reason at all. This example from Hugh Darwen, from which I think we can safely assume that the obligation annoys him:
SELECT DISTINCT E#, TOTAL_PAY
FROM ( SELECT E#, SALARY + BONUS AS TOTAL_PAY
FROM EMP ) AS TEETH_GNASHER
WHERE TOTAL_PAY >= 500
Personally, for such a meaningless requirement I choose the almost meaningless and uncontroversial name, DT1 being a contraction of first derived table and allowing for DT2, DT3, etc e.g.
SELECT DISTINCT E#, TOTAL_PAY
FROM ( SELECT E#, SALARY + BONUS AS TOTAL_PAY
FROM EMP ) AS DT1
WHERE TOTAL_PAY >= 500

Why is selecting specified columns, and all, wrong in Oracle SQL?

Say I have a select statement that goes..
select * from animals
That gives a a query result of all the columns in the table.
Now, if the 42nd column of the table animals is is_parent, and I want to return that in my results, just after gender, so I can see it more easily. But I also want all the other columns.
select is_parent, * from animals
This returns ORA-00936: missing expression.
The same statement will work fine in Sybase, and I know that you need to add a table alias to the animals table to get it to work ( select is_parent, a.* from animals ani), but why must Oracle need a table alias to be able to work out the select?
Actually, it's easy to solve the original problem. You just have to qualify the *.
select is_parent, animals.* from animals;
should work just fine. Aliases for the table names also work.
There is no merit in doing this in production code. We should explicitly name the columns we want rather than using the SELECT * construct.
As for ad hoc querying, get yourself an IDE - SQL Developer, TOAD, PL/SQL Developer, etc - which allows us to manipulate queries and result sets without needing extensions to SQL.
Good question, I've often wondered this myself but have then accepted it as one of those things...
Similar problem is this:
sql>select geometrie.SDO_GTYPE from ngg_basiscomponent
ORA-00904: "GEOMETRIE"."SDO_GTYPE": invalid identifier
where geometrie is a column of type mdsys.sdo_geometry.
Add an alias and the thing works.
sql>select a.geometrie.SDO_GTYPE from ngg_basiscomponent a;
Lots of good answers so far on why select * shouldn't be used and they're all perfectly correct. However, don't think any of them answer the original question on why the particular syntax fails.
Sadly, I think the reason is... "because it doesn't".
I don't think it's anything to do with single-table vs. multi-table queries:
This works fine:
select *
from
person p inner join user u on u.person_id = p.person_id
But this fails:
select p.person_id, *
from
person p inner join user u on u.person_id = p.person_id
While this works:
select p.person_id, p.*, u.*
from
person p inner join user u on u.person_id = p.person_id
It might be some historical compatibility thing with 20-year old legacy code.
Another for the "buy why!!!" bucket, along with why can't you group by an alias?
The use case for the alias.* format is as follows
select parent.*, child.col
from parent join child on parent.parent_id = child.parent_id
That is, selecting all the columns from one table in a join, plus (optionally) one or more columns from other tables.
The fact that you can use it to select the same column twice is just a side-effect. There is no real point to selecting the same column twice and I don't think laziness is a real justification.
Select * in the real world is only dangerous when referring to columns by index number after retrieval rather than by name, the bigger problem is inefficiency when not all columns are required in the resultset (network traffic, cpu and memory load).
Of course if you're adding columns from other tables (as is the case in this example it can be dangerous as these tables may over time have columns with matching names, select *, x in that case would fail if a column x is added to the table that previously didn't have it.
why must Oracle need a table alias to be able to work out the select
Teradata is requiring the same. As both are quite old (maybe better call it mature :-) DBMSes this might be historical reasons.
My usual explanation is: an unqualified * means everything/all columns and the parser/optimizer is simply confused because you request more than everything.

Object Relational Mapping Issues: Suggestions needed

I've been trying to come up with a good design pattern for mapping data contained in relational databases to the business objects I've created but I keep hitting a wall.
Consider the following tables:
TYPE: typeid, description
USER: userid, username, usertypeid->TYPE.typeid, imageid->IMAGE.imageid
IMAGE: imageid, location, imagetypeid->TYPE.typeid
I would like to gather all the information regarding a specific user. Creating a query for this isn't too difficult.
SELECT u.*, ut.*, i.*, it.* FROM user u
INNER JOIN type ut ON ut.typeid = u.usertypeid
INNER JOIN image i ON i.imageid = u.imageid
INNER JOIN type it ON it.typeid = i.imagetypeid
WHERE u.userid = #userid
The problem is that the field names collide and then I'm forced to alias every single field which gets out of hand very quickly.
Does anyone have a decent design pattern for this kind of thing?
I've thought about retrieving multiple results from a single stored procedure and then using a dataset to iterate through each one but I'm worried that some performance issues might bite me in the butt later. For example instead of the above query something like:
SELECT u.*, t.* FROM user u
INNER JOIN type t ON t.typeid = u.usertypeid
WHERE u.userid = #userid;
SELECT i.*, t.* FROM image i
INNER JOIN type t ON t.typeid = i.imagetypeid
INNER JOIN user u ON u.imageid = i.imageid
WHERE u.userid = #userid;
Does that seem like a decent solution? Can anyone foresee any issues with this approach?
Never use the SQL * wildcard in production code. Always spell out all the columns you want to retrieve.
Then aliasing some of them doesn't seem like such a huge amount of extra work.
Re your comment asking for background and reasoning:
Sometimes you don't really need every column from all tables, and fetching them can be needlessly costly (especially for large strings and blobs). There is no SQL syntax for "all columns except the following exceptions."
You can't alias columns that you fetch using the wildcard. Once you need to alias any of the columns, you need to expand the wildcard to list all the columns explicitly.
If the table structure changes, e.g. columns are renamed, reordered, dropped, or added, then the wildcard fetches them all, by position as defined in the tables. This may seem like a convenience, but not when your application depends on columns being in the result set by a given name or in a given position. You can get mysterious bugs where your application displays columns in the wrong order (if referencing columns by position), or shows them as blank (if referencing columns by name).
However, if the SQL query names columns explicitly, you can employ the "Fail Early" principle. This helps debugging, because it leads you directly to the SQL query that needs to be edited to account for the schema change.