How can I Order By a column I don't want displayed in the output? [closed] - sql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed yesterday.
Improve this question
I'm trying to generate a query to display specific data in specific grouping and order.
The only issue I have is that, to get the proper Order of the data in the columns I need to order it by a column that I have not SELECTED and do not want displayed.
So When I try to ORDER BY that column I get the error:
"Column ** is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause."
Is there a syntax that will allow me to do this?
Here's the query I'm working with:
select Pr.EmployeeNo as EmpNo, EmployeeFName as EmpFName, EmployeeLName as EmpLName,
ProjectName, ProjectStartDate as ProjStartDate, JobName as Job, JobRate, HoursWorked as Hours
from Employee as Em join ProjEmp as Pr on Em.EmployeeNo = Pr.EmployeeNo
join Project as Pt on Pr.ProjectID = Pt.ProjectID
join Job as Jb on Em.JobID = Jb.JobID
Group by Pr.EmployeeNo, EmployeeFName, EmployeeLName, ProjectName, ProjectStartDate, JobName, JobRate, HoursWorked

Stop and read the full error again. It tells you exactly what's going on. Hint: the problem is not that it's missing from the SELECT clause. There's no reason you can't do this:
SELECT ColumnA
FROM [Table1]
ORDER BY ColumnB
The problem is you have a GROUP BY clause:
Column is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.
And this makes sense. Say you have this table:
ColumnA
ColumnB
ColumnC
1
2
3
1
4
9
2
6
7
2
8
5
And then try to run this query:
SELECT ColumnA, MAX(ColumnB)
FROM [Table]
GROUP BY ColumnA
ORDER BY ColumnC
This query tries to ORDER BY ColumnC, but there's more than one value for ColumnC in each group! We have two groups on ColumnA: 1 and 2. Group 1 has two "C" values: 3 and 9. Group 2 also has two "C" values: 7 and 5. Depending on which rows is selected, you could end up with different orders.
ColumnA, though, is okay, because it's part of the GROUP BY expression. That means we know what value to use. MAX(ColumnB) is also okay, because MAX() is an aggregate function. It tells us which value from the group to use in a deterministic way. But the ColumnC reference is ambiguous(!), and so is not allowed.
So in the SQL from the question, you are free to use any of these columns for the ORDER BY clause:
Pr.EmployeeNo, EmployeeFName, EmployeeLName, ProjectName, ProjectStartDate, JobName, JobRate, HoursWorked
If you want to use a different column, you must either alter the grouping (and think carefully on the consequences) or use an aggregate function on the column group.

I figured it out. The answer was to add the column to the GROUP BY.
Because I didn't SELECT the column, it won't display. BUT GROUPing BY the column allows me to ORDER BY it, and it won't display because it wasn't actually SELECTED.
Sorry for the trouble.

Related

Use of HAVING without GROUP BY not working as expected

I am starting to learn SQL Server, in the documentation found in msdn states like this
HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.
This made me to think that we can use having without a groupBy clause, but when I am trying to make a query I am not able to use it.
I have a table like this
CREATE TABLE [dbo].[_abc]
(
[wage] [int] NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[_abc] (wage)
VALUES (4), (8), (15), (30), (50)
GO
Now when I run this query, I get an error
select *
from [dbo].[_abc]
having sum(wage) > 5
Error:
The documentation is correct; i.e. you could run this statement:
select sum(wage) sum_of_all_wages
, count(1) count_of_all_records
from [dbo].[_abc]
having sum(wage) > 5
The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves.
* of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.
There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.
NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:
select *
from [dbo].[_abc]
where wage > 5
Equally, if you want the sum of all wages greater than 5 you can do this
select sum(wage) sum_of_wage_over_5
from [dbo].[_abc]
where wage > 5
Or if you wanted to compare the sum of wages over 5 with those under:
select case when wage > 5 then 1 else 0 end wage_over_five
, sum(wage) sum_of_wage
from [dbo].[_abc]
group by case when wage > 5 then 1 else 0 end
See runnable examples here.
Update based on comments:
Do you need having to use aggregate functions?
No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.
The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1
What's the point?
It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.
One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.
declare #totalUserLicenses int = 100
select count(1) NumberOfActiveUsers
, #totalUserLicenses NumberOfLicenses
, count(1) - #totalUserLicenses NumberOfAdditionalLicensesToPurchase
from [dbo].[Users]
where enabled = 1
having count(1) > #totalUserLicenses
Isn't the select irrelevant to the having clause?
Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.
I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:
select *
from [dbo].[_abc]
where exists
(
select 1
from [dbo].[_abc]
having sum(wage) > 5
)
However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.
Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:
select wage
from
(
select sum(wage) sum_of_wage
from [dbo].[_abc]
group by 1
) singleRowResult
where sum_of_wage > 5
That won't run because wage is not available to the outer query; only sum_of_wage is returned.
HAVING without GROUP BY clause is perfectly valid but here is what you need to understand:
The result will contain zero or one row
The implicit GROUP BY will return exactly one row even if the WHERE condition matched zero rows
HAVING will keep or eliminate that single row based on the condition
Any column in the SELECT clause needs to be wrapped inside an aggregate function
You can also specify an expression as long as it is not functionally dependent on the columns
Which means you can do this:
SELECT SUM(wage)
FROM employees
HAVING SUM(wage) > 100
-- One row containing the sum if the sum is greater than 5
-- Zero rows otherwise
Or even this:
SELECT 1
FROM employees
HAVING SUM(wage) > 100
-- One row containing "1" if the sum is greater than 5
-- Zero rows otherwise
This construct is often used when you're interested in checking if a match for the aggregate was found:
SELECT *
FROM departments
WHERE EXISTS (
SELECT 1
FROM employees
WHERE employees.department = departments.department
HAVING SUM(wage) > 100
)
-- all departments whose employees earn more than 100 in total
In SQL you cannot return aggregate functioned columns directly. You need to group the non aggregate fields
As shown below example
USE AdventureWorks2012 ;
GO
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM(LineTotal) > 100000.00
ORDER BY SalesOrderID ;
In your case you don't have identity column for your table it should come as below
Alter _abc
Add Id_new Int Identity(1, 1)
Go

MS Access select distinct random values

How can I select 4 distinct random values from the field answer in MS Access table question?
SELECT TOP 4 answer,ID FROM question GROUP BY answer ORDER BY rnd(INT(NOW*ID)-NOW*ID)
Gives error message:
Run-time error '3122': Your query does not include the specified
expression 'ID' as part of an aggregate function.
SELECT DISTINCT TOP 4 answer,ID FROM question ORDER BY rnd(INT(NOW*ID)-NOW*ID)
Gives error message:
Run-time error '3093': ORDER BY clause (rnd(INT(NOWID)-NOWID))
conflicts with DISTINCT.
Edit:
Tried this:
SELECT TOP 4 *
FROM (SELECT answer, Rnd(MIN(ID)) AS rnd_id FROM question GROUP BY answer) AS A
ORDER BY rnd_id;
Seems to work sofar..
I suggest:
SELECT TOP 4 answer
FROM question
GROUP BY answer
ORDER BY Rnd(MIN(ID));
I don't think the subquery is necessary. And including the random value on the SELECT doesn't seem useful.
I've creted a simple quiz application 2 years ago, and this is the query that I use to get a random question from the table.
SELECT TOP 4 * FROM Questions ORDER BY NEWID()

In sql server why cant we use functions when comparing data from two tables [duplicate]

This question already has answers here:
Sql Server : How to use an aggregate function like MAX in a WHERE clause
(6 answers)
Closed 7 years ago.
can someone please tell me what is wrong with the following query
select 1
from table1 a,
table2 b
where a.pdate=max(b.pdate)
It is not compiled.
the other way to write this query is
set #pdate=pdate from table2
select 1
from table1 a,
table2 b
where a.pdate=max(b.pdate)
But I want to understand what is wrong with the first query.
Thanks
But I want to understand what is wrong with the first query.
The error message tells you something that could be of value to you.
An aggregate may not appear in the WHERE clause unless it is in a
subquery contained in a HAVING clause or a select list, and the column
being aggregated is an outer reference.
The max() function is an aggregate that returns the max value for a set of rows. The where clause is used to filter rows. So if you use an aggregate in the place where you are doing the filtering it is not clear what rows you actually want the max value for.
A rewrite could look like this:
select 1
from dbo.table1 as a
where a.pdate = (
select max(b.pdate)
from dbo.table2 as b
);
even second query is wrong.
Correct way,
Select #pdate=max(pdate) from table2
select 1
from table1 a where a.pdate=#pdate
or,
select 1
from table1 a where a.pdate=(Select max(pdate) from table2)
if you mention another column name apart from aggregate column then you hv to use group by

"group by" needed in count(*) SQL statement?

The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;

difference between usage of having clause and where clause [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
SQL: What's the difference between HAVING and WHERE?
What is the difference between using having clause and where clause. Could any one explain in detail.
HAVING filters grouped elements,
WHERE filters ungrouped elements.
Example 1:
SELECT col1, col2 FROM table
WHERE col1 = #id
Example 2:
SELECT SUM(col1), col2 FROM table
GROUP BY col2
HAVING SUM(col1) > 10
Because the HAVING condition can only be applied in the second example AFTER the grouping has occurred, you could not rewrite it as a WHERE clause.
Example 3:
SELECT SUM(col1), col2 FROM table
WHERE col1 = #id
GROUP BY col2
HAVING SUM(col1) > 10
demonstrates how you might use both WHERE and HAVING together:
The table data is first filtered by col1 = #id
then the filtered data is grouped
then the grouped data is filtered again by SUM(col1) > 10
WHERE filters rows before they are grouped in GROUP BY clause
while HAVING filters the aggregate values after GROUP BY takes place
HAVING specifies a search for something used in the SELECT statement.
In other words.
HAVING applies to groups.
WHERE applies to rows.
Without a GROUP BY, there is no difference (but HAVING looks strange then)
With a GROUP BY
HAVING is for testing condition on the aggregate (MAX, SUM, COUNT etc)
HAVING column = 1 is the same as WHERE column = 1 (no aggregate on column )
WHERE COUNT(*) = 1 is not allowed.
HAVING COUNT(*) = 1 is allowed
Having is for use with an aggregate such as Sum. Where is for all other cases.
They specify a search condition for a group or an aggregate. But the difference is that HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause. Having Clause is basically used only with the GROUP BY function in a query whereas WHERE Clause is applied to each row before they are part of the GROUP BY function in a query.
As other already said, having is used with group by. The reason is the order of execution - where is executed before group by, having is executed after it
Think of it as a matter of where the filtering happens.
When you specify a where clause you filter input rows to your aggregate function (ie: I only want to get the average age on persons living in a specific city.) When you specify a having constraint you specify that you only want a certain subset of the averages. (I only want to see cities with an average age of 70 years or above.)
Having is for aggregate functions, e.g.
SELECT *
FROM foo
GROUP BY baz
HAVING COUNT(*) > 8