Average when meeting where clause - sql

I want to find the average amount of a field where it meets a criterion. It is embedded in a big table but I would like this average field in there instead of doing it in a separate table.
This is what I have so far:
Select....
Avg( (currbal) where (select * from table
where ament2 in ('r1','r2'))
From table

If you want to AVG only a subset of a query use case when ... then to replace value in non-matching rows with null as nulls are ignored by avg().
Select id,
sum(something) SomethingSummed,
avg(case when ament2 in ('r1','r2') then currbal end) CurrbalAveragedForR1R2
From [table]
group by id

You can put all the other sums which you want to be embedded into the AVG statement, inside the table reference inside the FROM clause. Something like:
SELECT AVG(currbal)
FROM
(
SELECT * -- other sums
FROM table
WHERE ament2 IN ('r1','r2')
) t

You can write a full sub-select into the select list:
SELECT ...,
(SELECT AVG(Currbal) FROM Table WHERE ament2 IN ('r1', 'r2')) AS avg_currbal,
...
FROM ...
Whether that will do exactly what you want depends on a number of things. You might need to make that into a correlated subquery; assuming 'ament2' is in Table, it is not a correlated sub-query at the moment.

Related

Use of HAVING without GROUP BY not working as expected

I am starting to learn SQL Server, in the documentation found in msdn states like this
HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.
This made me to think that we can use having without a groupBy clause, but when I am trying to make a query I am not able to use it.
I have a table like this
CREATE TABLE [dbo].[_abc]
(
[wage] [int] NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[_abc] (wage)
VALUES (4), (8), (15), (30), (50)
GO
Now when I run this query, I get an error
select *
from [dbo].[_abc]
having sum(wage) > 5
Error:
The documentation is correct; i.e. you could run this statement:
select sum(wage) sum_of_all_wages
, count(1) count_of_all_records
from [dbo].[_abc]
having sum(wage) > 5
The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves.
* of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.
There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.
NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:
select *
from [dbo].[_abc]
where wage > 5
Equally, if you want the sum of all wages greater than 5 you can do this
select sum(wage) sum_of_wage_over_5
from [dbo].[_abc]
where wage > 5
Or if you wanted to compare the sum of wages over 5 with those under:
select case when wage > 5 then 1 else 0 end wage_over_five
, sum(wage) sum_of_wage
from [dbo].[_abc]
group by case when wage > 5 then 1 else 0 end
See runnable examples here.
Update based on comments:
Do you need having to use aggregate functions?
No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.
The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1
What's the point?
It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.
One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.
declare #totalUserLicenses int = 100
select count(1) NumberOfActiveUsers
, #totalUserLicenses NumberOfLicenses
, count(1) - #totalUserLicenses NumberOfAdditionalLicensesToPurchase
from [dbo].[Users]
where enabled = 1
having count(1) > #totalUserLicenses
Isn't the select irrelevant to the having clause?
Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.
I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:
select *
from [dbo].[_abc]
where exists
(
select 1
from [dbo].[_abc]
having sum(wage) > 5
)
However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.
Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:
select wage
from
(
select sum(wage) sum_of_wage
from [dbo].[_abc]
group by 1
) singleRowResult
where sum_of_wage > 5
That won't run because wage is not available to the outer query; only sum_of_wage is returned.
HAVING without GROUP BY clause is perfectly valid but here is what you need to understand:
The result will contain zero or one row
The implicit GROUP BY will return exactly one row even if the WHERE condition matched zero rows
HAVING will keep or eliminate that single row based on the condition
Any column in the SELECT clause needs to be wrapped inside an aggregate function
You can also specify an expression as long as it is not functionally dependent on the columns
Which means you can do this:
SELECT SUM(wage)
FROM employees
HAVING SUM(wage) > 100
-- One row containing the sum if the sum is greater than 5
-- Zero rows otherwise
Or even this:
SELECT 1
FROM employees
HAVING SUM(wage) > 100
-- One row containing "1" if the sum is greater than 5
-- Zero rows otherwise
This construct is often used when you're interested in checking if a match for the aggregate was found:
SELECT *
FROM departments
WHERE EXISTS (
SELECT 1
FROM employees
WHERE employees.department = departments.department
HAVING SUM(wage) > 100
)
-- all departments whose employees earn more than 100 in total
In SQL you cannot return aggregate functioned columns directly. You need to group the non aggregate fields
As shown below example
USE AdventureWorks2012 ;
GO
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM(LineTotal) > 100000.00
ORDER BY SalesOrderID ;
In your case you don't have identity column for your table it should come as below
Alter _abc
Add Id_new Int Identity(1, 1)
Go

how to find maximum of sum of number using if else in procedure in sap hana sql

I want to list out the product which has highest sales amount on date wise.
note: highest sales amount in the sense max(sum(sales_amnt)...
by using if or case In the procedure in sap hana SQL....
I did this by using with the clause :
/--------------------------CORRECT ONE ----------------------------------------------/
WITH ranked AS
(
SELECT Dense_RAnk() OVER (ORDER BY SUM("SALES_AMNT"), "SALES_DATE", "PROD_NAME") as rank,
SUM("SALES_AMNT") AS Amount, "PROD_NAME",count(*), "SALES_DATE" FROM "KABIL"."DATE"
GROUP BY "SALES_DATE", "PROD_NAME"
)
SELECT "SALES_DATE", "PROD_NAME",Amount
FROM ranked
WHERE rank IN ( select MAX(rank) from ranked group by "SALES_DATE")
ORDER BY "SALES_DATE" DESC;
this is my table
You can not use IF along with SELECT statement. Note that, you can achieve most of boolean logics with CASE statement syntax
In select, you are applying it over a column and your logic will be executed as many as times the count of result set rows. Hence , righting an imperative logic is not well appreciated. Still, if you want to do the same, create a calculation view and use intermediate calculated columns to achieve what you are expecting .
try this... i got an answer ...
select "SALES_DATE","PROD_NAME",sum("SALES_AMNT")
from "KABIL"."DATE"
group by "SALES_DATE","PROD_NAME"
having (SUM("SALES_AMNT"),"SALES_DATE") IN (select
MAX(SUM_SALES),"SALES_DATE"
from (select SUM("SALES_AMNT")
as
SUM_SALES,"SALES_DATE","PROD_NAME"
from "KABIL"."DATE"
group by "SALES_DATE","PROD_NAME"
)
group by "SALES_DATE");

"group by" needed in count(*) SQL statement?

The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;

Group by or Distinct - But several fields

How can I use a Distinct or Group by statement on 1 field with a SELECT of All or at least several ones?
Example: Using SQL SERVER!
SELECT id_product,
description_fr,
DiffMAtrice,
id_mark,
id_type,
NbDiffMatrice,
nom_fr,
nouveaute
From C_Product_Tempo
And I want Distinct or Group By nom_fr
JUST GOT THE ANSWER:
select id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
from (
SELECT rn = row_number() over (partition by [nom_fr] order by id_mark)
, id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
From C_Product_Tempo
) d
where rn = 1
And this works prfectly!
If I'm understanding you correctly, you just want the first row per nom_fr. If so, you can simply use a subquery to get the lowest id_product per nom_fr, and just get the corresponding rows;
SELECT * FROM C_Product_Tempo WHERE id_product IN (
SELECT MIN(id_product) FROM C_Product_Tempo GROUP BY nom_fr
);
An SQLfiddle to test with.
You need to decide what to do with the other fields. For example, for numeric fields, do you want a sum? Average? Max? Min? For non-numeric fields to you want the values from a particular record if there are more than one with the same nom_fr?
Some SQL Systems allow you to get a "random" record when you do a GROUP BY, but SQL Server will not - you must define the proper aggregation for columns that are not in the GROUP BY.
GROUP BY is used to group in conjunction with an aggregate function (see http://www.w3schools.com/sql/sql_groupby.asp), so it's no use grouping without counting, summing up etc. DISTINCT eleminates duplicates but how that matches with the other columns you want to extract, I can't imagine, because some rows will be removed from the result.

sql divide column by column max

I have a column of count and want to divide the column by max of this column to get the rate.
I tried
select t.count/max(t.count)
from table t
group by t.count
but failed.
I also tried the one without GROUP BY, still failed.
Order the count desc and pick the first one as dividend didn't work in my case. Consider I have different counts for product subcategory. For each product category, I want to divide the count of subcategory by the max of count in that category. I can't think of a way avoiding aggregate func.
If you want the MAX() per category you need a correlated subquery:
select t.count*1.0/(SELECT max(t.count)
FROM table a
WHERE t.category = a.category)
from table t
Or you need to PARTITION BY your MAX()
select t.count/(max(t.count) over (PARTITION BY category))
from table t
group by t.count
The following works in all dialects of SQL:
select t.count/(select max(t.count) from t)
from table t
group by t.count;
Note that some versions of SQL do integer division, so the result will be either 0 or 1. You can fix this by multiplying by 1.0 or casting to a float.
Most versions of SQL also support:
select t.count/(max(t.count) over ())
from table t
group by t.count;
The same caveat applies about integer division.
You might want to try using a subquery to derive the max value (including both in the same query might not work the way that you are expecting, since you are grouping on the same column that you are aggregating)
Select t.count / (select max(sub.count) from table sub)
from table t
group by t.count