Different ways to alias a column - sql

What is the difference between
select empName as EmployeeName from employees
versus
select EmployeeName = empName from employees
from a technical point of view. Not sure if this is just SQL server specific or not.
Appreciate your answers.

I'd prefer the first one, since the second one is not portable -
select EmployeeName = empName from employees
is either a syntax error (at least in SQLite and Oracle), or it might not give you what you expect (comparing two columns EmployeeName and empName and returning the comparison result as a boolean/integer), whereas
select empName EmployeeName from employees
is the same as
select empName as EmployeeName from employees
which is my preferred variant.

The main advantage of the second syntax is that it allows the column aliases to be all lined up which can be of benefit for long expressions.
SELECT foo,
bar,
baz = ROW_NUMBER() OVER (PARTITION BY foo ORDER BY bar)
FROM T

I don't think there's a technical difference. Its mainly preferential. I go for the second as its easier to spot columns in big queries, especially if the query is properly indented.

Related

SQL: Using the AND statement with nested queries

I am learning with an online tutorial series and right now we are looking into nested queries. Trying to play around and think of ways I could play with this concept I wanted to try to make combined nested queries, but I am not really sure how and google isn't providing me much luck. Once again it's hard to word things like this. I am using MS SQL
SELECT EmployeeID, FirstName, LastName
FROM SQLTutorial.dbo.EmployeeDemographics
WHERE EmployeeID
IN (SELECT EmployeeID
FROM SQLTutorial.dbo.EmployeeSalary
WHERE JobTitle = 'DBA')
/*
AND
WHERE EmployeeID
IN (SELECT EmployeeID
FROM SQLTutorial.dbo.WareHouseEmployeeDemographics
WHERE Age = 29)
*/
This is what I thought I could do. From the example I know that the uncommented part works. It gets the ID, First Name, and Last name from EDemo IF the ID is in BOTH EDemo AND ESalary AND their job title is DBA.
Well I wanted to limit those results further by only having the results be those who work in the Warehouse as well AND are 29.
I mean if I run the commented code without the uncommented it works like intended as well, but I am not sure why I cannot just combine the two. I am 90% sure it's b/c I can't use the AND statement like this and I have a sneaking feeling I have to nest the two together and make a nest in a nest.
And it will be like a taco inside taco within a Taco Bell that's inside a KFC that's within a mall that's inside your dream! Sorry I couldn't help myself.
Just remove second second "Where" and your query is good to execute. You just need one where clause within which you can have all your condition combined with and, or etc..
SELECT EmployeeID, FirstName, LastName
FROM SQLTutorial.dbo.EmployeeDemographics
WHERE EmployeeID
IN (SELECT EmployeeID
FROM SQLTutorial.dbo.EmployeeSalary
WHERE JobTitle = 'DBA')
AND EmployeeID
IN (SELECT EmployeeID
FROM SQLTutorial.dbo.WareHouseEmployeeDemographics
WHERE Age = 29)
If you want all the employees whose ID is in BOTH EDemo AND ESalary AND their job title is DBA but or who work in the Warehouse AND are 29.
SELECT EmployeeID, FirstName, LastName
FROM SQLTutorial.dbo.EmployeeDemographics
WHERE EmployeeID
IN (SELECT EmployeeID
FROM SQLTutorial.dbo.EmployeeSalary
WHERE JobTitle = 'DBA')
or EmployeeID
IN (SELECT EmployeeID
FROM SQLTutorial.dbo.WareHouseEmployeeDemographics
WHERE Age = 29)
As pointed out in the comments, the issue is having two WHEREs. Only the first WHERE is needed; after specifying WHERE, conditions can just be joined using AND, OR, that sort of thing. Uncommenting and removing the second WHERE should fix it. So, instead of "WHERE a=b AND WHERE c=d" just write "WHERE a=b AND c=d".

must appear in the GROUP BY clause in postgresql

I am getting this error:
ERROR: column "programmer.pname" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select pname, min(age(doj)) from programmer ;
I have a table called programmer and columns dob, doj with date.
Here doj is date of joining.
I want to find the least experienced programmer of all the programmers.
That's my try:
SELECT pname, min(age(doj)) FROM programmer;
and I got the above error.
What is that programmer.pname and what is the correct query for the above?
The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.
select pname, min(age(doj))
from programmer
group by pname
To find the minimum experienced programmer of all the programmers
select pname
,min(age(doj)) mindoj
from programmer
group by pname
order by mindoj limit 1
or
select pname,doj
from programmer
order by doj limit 1
You may have more than one minimum experienced programmer(programmers with same minimum experience) in that case you use this
select pname,doj
from programmer
where doj=(select min(doj) from programmer)
what is that programmer.pname and what is the correct query for the
above?
programmer.pname = tablename.columnname

Many Fields In the Group By Clause

I am learning SQL now, and I have a question. I recently came across a query that hand a large number of column names in the group by clause. I've used group by clauses before, and I've only ever seen one column name included in it.
SELECT TransportType.Description, TransportType.CargoCapacity, TransportType.Range, Transport.SerialNumber, Transport.PurchaseDate, Transport.RetiredDate,
MAX(Repair.BeginWorkDate) AS LatestRepairDate
FROM Transport INNER JOIN
TransportType ON Transport.TransportTypeID = TransportType.TransportTypeID LEFT OUTER JOIN
Repair ON Transport.TransportNumber = Repair.TransportNumber
GROUP BY TransportType.Description, TransportType.CargoCapacity, TransportType.Range, Transport.SerialNumber, Transport.PurchaseDate,
Transport.RetiredDate
HAVING (Transport.RetiredDate IS NULL)
ORDER BY TransportType.Description, Transport.SerialNumber
Why are there so many columns in the group by clause?
Except in MySQL & SQLite (which are lenient about the GROUP BY with sometimes indeterminate results), most RDBMS require every non-aggregated column (MAX(),MIN(),SUM(),COUNT(), etc) that appears in the SELECT list to be in the GROUP BY.
The behavior of MySQL & SQLite when columns from SELECT aren't listed in GROUP BY is not well defined. If for example, you execute a query like:
SELECT firstname, lastname, COUNT(*) FROM names GROUP BY lastname
MySQL would give you a result without complaint.
However, if your table included two different values of firstname having the same lastname, your resultant COUNT(*) would count both of them while only returning the firstname of one of them. What's more, which firstname MySQL chooses to return isn't defined so you can't really rely on it returning the first of the pair, for example.
From a table like:
firstname, lastname
--------------------
Jane Smith
John Smith
Peter Jones
The not-fully-correct result might be:
firstname, lastname, COUNT(*)
-----------------------------
Jane Smith 2 <----wrong!
Peter Jones 1
Outside MySQL & SQLite, columns referenced anywhere in the SELECT list not also appearing in the GROUP BY will result in a query parse error.
Commonly here on Stack Overflow, we encounter users with questions about the GROUP BY, having just begun working with an RDBMS that is stricter about its usage. If you learn aggregates in MySQL first, chances are you'll need to relearn to do them properly when moving to a different RDBMS.

Select and Group by together

I have my query like this:
Select
a.abc,
a.cde,
a.efg,
a.agh,
c.dummy
p.test
max(b.this)
sum(b.sugar)
sum(b.bucket)
sum(b.something)
followed by some outer join and inner join. Now the problem is when in group by
group by
a.abc,
a.cde,
a.efg,
a.agh,
c.dummy,
p.test
The query works fine. But if I remove any one of them from group by it gives:
SQLSTATE: 42803
Can anyone explain the cause of this error?
Generally, any column that isn't in the group by section can only be included in the select section if it has an aggregating function applied to it. Or, another way, any non-aggregated data in the select section must be grouped on.
Otherewise, how do you know what you want done with it. For example, if you group on a.abc, there can only be one thing that a.abc can be for that grouped row (since all other values of a.abc will come out in a different row). Here's a short example, with a table containing:
LastName FirstName Salary
-------- --------- ------
Smith John 123456
Smith George 111111
Diablo Pax 999999
With the query select LastName, Salary from Employees group by LastName, you would expect to see:
LastName Salary
-------- ------
Smith ??????
Diablo 999999
The salary for the Smiths is incalculable since you don't know what function to apply to it, which is what's causing that error. In other words, the DBMS doesn't know what to do with 123456 and 111111 to get a single value for the grouped row.
If you instead used select LastName, sum(Salary) from Employees group by LastName (or max() or min() or ave() or any other aggregating function), the DBMS would know what to do. For sum(), it will simply add them and give you 234567.
In your query, the equivalent of trying to use Salary without an aggregating function is to change sum(b.this) to just b.this but not include it in the group by section. Or alternatively, remove one of the group by columns without changing it to an aggregation in the select section.
In both cases, you'll have one row that has multiple possible values for the column.
The DB2 docs at publib for sqlstate 42803 describe your problem:
A column reference in the SELECT or HAVING clause is invalid, because it is not a grouping column; or a column reference in the GROUP BY clause is invalid.
SQL will insist that any column in the SELECT section is either included in the GROUP BY section or has an aggregate function applied to it in the SELECT section.
This article gives a nice explanation of why this is the case. The article is sql server specific but the principle should be roughly similar for all RDBMS

How to randomize order of data in 3 columns

I have 3 columns of data in SQL Server 2005 :
LASTNAME
FIRSTNAME
CITY
I want to randomly re-order these 3 columns (and munge the data) so that the data is no longer meaningful. Is there an easy way to do this? I don't want to change any data, I just want to re-order the index randomly.
When you say "re-order" these columns, do you mean that you want some of the last names to end up in the first name column? Or do you mean that you want some of the last names to get associated with a different first name and city?
I suspect you mean the latter, in which case you might find a programmatic solution easier (as opposed to a straight SQL solution). Sticking with SQL, you can do something like:
UPDATE the_table
SET lastname = (SELECT lastname FROM the_table ORDER BY RAND())
Depending on what DBMS you're using, this may work for only one line, may make all the last names the same, or may require some variation of syntax to work at all, but the basic approach is about right. Certainly some trials on a copy of the table are warranted before trying it on the real thing.
Of course, to get the first names and cities to also be randomly reordered, you could apply a similar query to either of those columns. (Applying it to all three doesn't make much sense, but wouldn't hurt either.)
Since you don't want to change your original data, you could do this in a temporary table populated with all rows.
Finally, if you just need a single random value from each column, you could do it in place without making a copy of the data, with three separate queries: one to pick a random first name, one a random last name, and the last a random phone number.
I suggest using newid with checksum for doing randomization
SELECT LASTNAME, FIRSTNAME, CITY FROM table ORDER BY CHECKSUM(NEWID())
In SQL Server 2005+ you could prepare a ranked rowset containing the three target columns and three additional computed columns filled with random rankings (one for each of the three target columns). Then the ranked rowset would be joined with itself three times using the ranking columns, and finally each of the three target columns would be pulled from their own instance of the ranked rowset. Here's an illustration:
WITH sampledata (FirstName, LastName, CityName) AS (
SELECT 'John', 'Doe', 'Chicago' UNION ALL
SELECT 'James', 'Foe', 'Austin' UNION ALL
SELECT 'Django', 'Fan', 'Portland'
),
ranked AS (
SELECT
*,
FirstNameRank = ROW_NUMBER() OVER (ORDER BY NEWID()),
LastNameRank = ROW_NUMBER() OVER (ORDER BY NEWID()),
CityNameRank = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM sampledata
)
SELECT
fnr.FirstName,
lnr.LastName,
cnr.CityName
FROM ranked fnr
INNER JOIN ranked lnr ON fnr.FirstNameRank = lnr.LastNameRank
INNER JOIN ranked cnr ON fnr.FirstNameRank = cnr.CityNameRank
This is the result:
FirstName LastName CityName
--------- -------- --------
James Fan Chicago
John Doe Portland
Django Foe Austin
select *, rand() from table order by rand();
I understand some versions of SQL have a rand() that doesn't change for each line. Check for yours. Works on MySQL.