Condition inside a count -SQL - sql

I am trying to write a condition inside a count statement where it should only count the entries which do not have an ENDDATE. i am looking for writing the condition inside the count as this is a very small part of a large SQl Query

sample query,
select product, count(*) as quantity
from table
where end_date is null
group by age
This query lists quantity for each product which do not have an end date

One method uses conditional aggregation:
select sum(case when end_date is null then 1 else 0 end) as NumNull
. . .
Another method is just to subtract two counts:
select ( count(*) - count(end_date) ) as NumNull
count(end_date) counts the number that are not NULL, so subtracting this from the full count gets the number that are NULL.

Uhmmmm.
It sounds like you are looking for conditional aggregation.
So, if you have a current statement that's sort of working (and we're just guessing because we don't see anything you have attempted so far...)
SELECT COUNT(1)
FROM mytable t
And you want another another expression that returns a count of rows that meet some set of conditions...
and when you say "do not have an ENDDATE", you are refderring to rows that have an ENDDATE value of NULL (and again, we're just guessing that the table has a column named ENDDATE. Every row will have an ENDDATE column.)
We'll use a ANSI standards compliant CASE expression, because this would work in most databases (SQL Server, Oracle, MySQL, Postgres... and we don't have clue what database you are using.
SELECT COUNT(1)
, COUNT(CASE WHEN t.ENDDATE IS NULL THEN 1 ELSE NULL END) AS cnt_null_enddate
FROM mytable t

Related

Oracle order by query using select case

in Oracle Live SQL i was trying to use simple order by sql using select (case when) query
i tried to get to same result select * from tt order by 1
replace 1 with (select (case when 1=1 then 1 else 2 end) from dual)
but two result completely different.
i want table ordered by column 1 however the query using select case when query doesn't sort by column 1.
I don't know why and want to know how this query works in oracle db
Compare
...
order by 2
and
...
order by 1+1
At "compile" time the first 2 is an integer constant so it is a position of the column, the db engine sorts by the specified column. The second 1+1 is an integer expression and the db engine sorts by this value '2'. Same, (select (case when 1=1 then 1 else 2 end) from dual) is an expression, not a column specification.
When you specify a number in the ORDER BY clause, Oracle will sort by that column of the resulting select. As an example, ORDER BY 1,2 will sort by the first column, then the second column. If there is no second column, then you will get an error.
In the ORDER BY of the outermost query, there is essentially no sorting happening in your query because 1 is always returned from your subquery. This is sorting by the value 1 and not the first column.
If you explain the logic you are hoping to achieve, then we may be able to assist, but that is what is happening with your existing queries.

Use of HAVING without GROUP BY not working as expected

I am starting to learn SQL Server, in the documentation found in msdn states like this
HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.
This made me to think that we can use having without a groupBy clause, but when I am trying to make a query I am not able to use it.
I have a table like this
CREATE TABLE [dbo].[_abc]
(
[wage] [int] NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[_abc] (wage)
VALUES (4), (8), (15), (30), (50)
GO
Now when I run this query, I get an error
select *
from [dbo].[_abc]
having sum(wage) > 5
Error:
The documentation is correct; i.e. you could run this statement:
select sum(wage) sum_of_all_wages
, count(1) count_of_all_records
from [dbo].[_abc]
having sum(wage) > 5
The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves.
* of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.
There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.
NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:
select *
from [dbo].[_abc]
where wage > 5
Equally, if you want the sum of all wages greater than 5 you can do this
select sum(wage) sum_of_wage_over_5
from [dbo].[_abc]
where wage > 5
Or if you wanted to compare the sum of wages over 5 with those under:
select case when wage > 5 then 1 else 0 end wage_over_five
, sum(wage) sum_of_wage
from [dbo].[_abc]
group by case when wage > 5 then 1 else 0 end
See runnable examples here.
Update based on comments:
Do you need having to use aggregate functions?
No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.
The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1
What's the point?
It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.
One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.
declare #totalUserLicenses int = 100
select count(1) NumberOfActiveUsers
, #totalUserLicenses NumberOfLicenses
, count(1) - #totalUserLicenses NumberOfAdditionalLicensesToPurchase
from [dbo].[Users]
where enabled = 1
having count(1) > #totalUserLicenses
Isn't the select irrelevant to the having clause?
Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.
I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:
select *
from [dbo].[_abc]
where exists
(
select 1
from [dbo].[_abc]
having sum(wage) > 5
)
However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.
Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:
select wage
from
(
select sum(wage) sum_of_wage
from [dbo].[_abc]
group by 1
) singleRowResult
where sum_of_wage > 5
That won't run because wage is not available to the outer query; only sum_of_wage is returned.
HAVING without GROUP BY clause is perfectly valid but here is what you need to understand:
The result will contain zero or one row
The implicit GROUP BY will return exactly one row even if the WHERE condition matched zero rows
HAVING will keep or eliminate that single row based on the condition
Any column in the SELECT clause needs to be wrapped inside an aggregate function
You can also specify an expression as long as it is not functionally dependent on the columns
Which means you can do this:
SELECT SUM(wage)
FROM employees
HAVING SUM(wage) > 100
-- One row containing the sum if the sum is greater than 5
-- Zero rows otherwise
Or even this:
SELECT 1
FROM employees
HAVING SUM(wage) > 100
-- One row containing "1" if the sum is greater than 5
-- Zero rows otherwise
This construct is often used when you're interested in checking if a match for the aggregate was found:
SELECT *
FROM departments
WHERE EXISTS (
SELECT 1
FROM employees
WHERE employees.department = departments.department
HAVING SUM(wage) > 100
)
-- all departments whose employees earn more than 100 in total
In SQL you cannot return aggregate functioned columns directly. You need to group the non aggregate fields
As shown below example
USE AdventureWorks2012 ;
GO
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM(LineTotal) > 100000.00
ORDER BY SalesOrderID ;
In your case you don't have identity column for your table it should come as below
Alter _abc
Add Id_new Int Identity(1, 1)
Go

Netezza SQL statement to return count or null based on day of week

I'm a beginner with SQL. I'm trying to determine if there's a way to write a SQL statement that will return a null value for certain days of the week, and a count on other days of the week. I can't use a script (the interface I'm using only allows me to execute a single statement).
The logic is something like this:
if max(as_of_date) is a Saturday or Sunday, then return null
Else select count(*) from table where (etc).
I assume that AS_OF_DATE is a column in your source table, and that your output should only be one row, and if even ONE row in the source table holds a record with the relevant date, then it will return non-null. Please elaborate on question (desired input/output would be nice)
Select
case when cnt>0
then cnt
end
from
( select count(*) cnt
from THE_TABLE
where EXTRACT(dow FROM AS_OF_DATE) not in (1,7)
) x

SQL - Count Duplicate Values Within Date Range

I have a simple, but large, database that I need to write a SQL statement for. The statements needs to do the following:
Get the 15 most popular values for a field.
From those 15, get the count that value has appeared within a particular time period.
My table contains both a Date and a Value field. I am able to extract the 15 most popular values, or get the count for a particular value in a given time period. I do not know how to put the two together.
This is my current SQL:
SELECT
Count( Value ) AS Total,
Value AS Value
FROM
Database
GROUP BY
Value
ORDER BY
Total DESC
LIMIT 15
That will get my most popular 15. But from that, I want to display the COUNT() that each Value is between two dates.
Would this require a HAVING clause?
I simplified the previous solution (which would also do a job) a little bit:
SELECT
Value,
Count(*) as TotalInPeriod
FROM Database
WHERE Value in (SELECT Value FROM Database GROUP BY Value
ORDER BY count(*) DESC LIMIT 15)
AND date_field BETWEEN your_start_date and your_end_date
GROUP BY Value
Try something like this. Make an inner query that finds the top 15 values overall, and join it to the main set to limit it to those values.
SELECT
Count( Value) as TotalInPeriod,
Value as Value
FROM
Database a
JOIN (SELECT
Count( Value ) AS Total,
Value AS Value
FROM
Database
GROUP BY
Value
ORDER BY
Total DESC
LIMIT 15) as topValues
ON
a.Value = topValues.Value
WHERE
a.date_field BETWEEN your_start_date and your_end_date
GROUP BY
a.Value

SQL subquery within a SELECT calculation

Very new to subqueries and find myself in need of help.
I would like to query from a single database. Within that query, I would like to calculate a variable from two variables with that database (SUBQ and TOTAL). My issue is this: my SUBQ variable needs to be subject to an additional set of WHERE constraints on top of those that will be employed for the whole query. Simplified example code below:
create table [blah]
as select date_part('YEAR',DATE) as Orig_Year,
sum([SUBQ variable])/sum(TOTAL) as UD_Rate
from [database]
where [full query requirements]
group by date_part('YEAR',DATE)
I have tried to create a subquery within that calculation by specifying a subquery in the FROM statement. So, for example,
select date_part('YEAR',DATE1) as Orig_year,
sum(a.SUBQ)/sum(b.TOTAL) as UD_Rate
from database b,
(select SUBQ
from database
where DATE2 is not null and
months_between(DATE3,DATE2) <= 100 and
VALUE1 in ('A','B')) a
where VALUE2 between 50.01 and 100
group by date_part('YEAR',DATE1)
Am I on the right track with my thinking here? I have yet to get anywhere close to a functional query and have had little luck finding a similar question online, so I'm at the point where I've tossed up my hands and come to you. Though I know little about them, would it be more appropriate to create a VIEW with the SUBQ value, and then merge it with the broader query?
Thoughts of pies and cakes for whoever is willing to assist me with this request. Thank you.
I think you just want condition aggregation in a window function. Something like this:
select sum(case when [subquery requirements] then t.subq else 0 end) / sum(t.Total)
from t;
I'm pretty sure this is what you are looking for. In terms of your create table:
select date_part('YEAR',DATE) as Orig_Year,
sum(case when ?? then Total else 0 end)/sum(TOTAL) as UD_Rate
from [database]
where [full query requirements]
group by date_part('YEAR', DATE);
I am guessing that the column to be compared is Total, subject to the conditions in the when.
Use Common-Table-Expression then:
-- Define the CTE expression name and column list.
WITH subquery (Orig_year, UD_Rate)
AS
-- Define the CTE query.
(
select date_part('YEAR',DATE1) as Orig_year,
sum(a.SUBQ)/sum(b.TOTAL) as UD_Rate
from database b,
(select SUBQ
from database
where DATE2 is not null and
months_between(DATE3,DATE2) <= 100 and
VALUE1 in ('A','B')) a
where VALUE2 between 50.01 and 100
group by date_part('YEAR',DATE1)
)
And then use subquery as you would use a table inside your main query