SQL CASE WHEN- can I do a function within a function? New to SQL - sql

SELECT
SP.SITE,
SYS.COMPANY,
SYS.ADDRESS,
SP.CUSTOMER,
SP.STATUS,
DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) AS MONTH_COUNT
CASE WHEN(MONTH_COUNT = 0 THEN MONTH_COUNT = DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES) AS DAY_COUNT)
ELSE NULL
END
FROM SALEPASSES AS SP
INNER JOIN SYSTEM AS SYS ON SYS.SITE = SP.SITE
WHERE STATUS IN (7,27,29);
I am still trying to understand SQL. Is this the right order to have everything? I'm assuming my datediff() is unable to work because it's inside case when. What I am trying to do, is get the day count if month_count is less than 1 (meaning it's less than one month and we need to count the days between the dates instead). I need month_count to run first to see if doing the day_count would even be necessary. Please give me feedback, I'm new and trying to learn!

Case is an expression, it returns a value, it looks like you should be doing this:
DAY_COUNT =
CASE WHEN DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) = 0
THEN DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES))
ELSE NULL END
You shouldn't actually need else null as NULL is the default.
Note also you [usually] cannot refer to a derived column in the same select

It appears that what you are trying to do is define the MonthCount column's value, and then reuse that value in another column's definition. (The Don't Repeat Yourself principle.)
In most dialects of SQL, you can't do that. Including MS SQL Server.
That's because SQL is a "declarative" language. This means that SQL Server is free to calculate the column values in any order that it likes. In turn, that means you're not allowed to do anything that would rely on one column being calculated before another.
There are two basic ways around that...
First, use CTEs or sub-queries to create two different "scopes", allowing you to define MonthCount before DayCount, and so reuse the value without retyping the definition.
SELECT
*,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
(
SELECT
*,
bar AS MonthCount
FROM
x
)
AS derive_month
The second main way is to somehow derive the value Before the SELECT block is evaluated. In this case, using APPLY to 'join' a single value on to each input row...
SELECT
x.*,
MonthCount,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
x
CROSS APPLY
(
SELECT
bar AS MonthCount
)
AS derive_month

Related

Django ORM remove unwanted Group by when annotate multiple aggregate columns

I want to create a query something like this in django ORM.
SELECT COUNT(CASE WHEN myCondition THEN 1 ELSE NULL end) as numyear
FROM myTable
Following is the djang ORM query i have written
year_case = Case(When(added_on__year = today.year, then=1), output_field=IntegerField())
qs = (ProfaneContent.objects
.annotate(numyear=Count(year_case))
.values('numyear'))
This is the query which is generated by django orm.
SELECT COUNT(CASE WHEN "analyzer_profanecontent"."added_on" BETWEEN 2020-01-01 00:00:00+00:00 AND 2020-12-31 23:59:59.999999+00:00 THEN 1 ELSE NULL END) AS "numyear" FROM "analyzer_profanecontent" GROUP BY "analyzer_profanecontent"."id"
All other things are good, but django places a GROUP BY at the end leading to multiple rows and incorrect answer. I don't want that at all. Right now there is just one column but i will place more such columns.
EDIT BASED ON COMMENTS
I will be using the qs variable to get values of how my classifications have been made in the current year, month, week.
UPDATE
On the basis of comments and answers i am getting here let me clarify. I want to do this at the database end only (obviously using Django ORM and not RAW SQL). Its a simple sql query. Doing anything at Python's end will be inefficient since the data can be too large. Thats why i want the database to get me the sum of records based on the CASE condition.
I will be adding more such columns in the future so something like len() or .count will not work.
I just want to create the above mentioned query using Django ORM (without an automatically appended GROUP BY).
When using aggregates in annotations, django needs to have some kind of grouping, if not it defaults to primary key. So, you need to use .values() before .annotate(). Please see django docs.
But to completely remove group by you can use a static value and django is smart enough to remove it completely, so you get your result using ORM query like this:
year_case = Case(When(added_on__year = today.year, then=1), output_field=IntegerField())
qs = (ProfaneContent.objects
.annotate(dummy_group_by = Value(1))
.values('dummy_group_by')
.annotate(numyear=Count(year_case))
.values('numyear'))
If you need to summarize only to one row then you should to use an .aggregate() method instead of annotate().
result = ProfaneContent.objects.aggregate(
numyear=Count(year_case),
# ... more aggregated expressions are possible here
)
You get a simple dictionary of result columns:
>>> result
{'numyear': 7, ...}
The generated SQL query is without groups, exactly how required:
SELECT
COUNT(CASE WHEN myCondition THEN 1 ELSE NULL end) as numyear
-- and more possible aggregated expressions
FROM myTable
What about a list comprehension:
# get all the objects
profane = ProfaneContent.objects.all()
# Something like this
len([pro for pro in profane if pro.numyear=today.year])
if the num years are equal it will add it to the list, so at the and you can check the len()
to get the count
Hopefully this is helpfull!
This is how I would write it in SQL.
SELECT SUM(CASE WHEN myCondition THEN 1 ELSE 0 END) as numyear
FROM myTable
SELECT
SUM(CASE WHEN "analyzer_profanecontent"."added_on"
BETWEEN 2020-01-01 00:00:00+00:00
AND 2020-12-31 23:59:59.999999+00:00
THEN 1
ELSE 0
END) AS "numyear"
FROM "analyzer_profanecontent"
GROUP BY "analyzer_profanecontent"."id"
If you intend to use other items in the SELECT clause I would recommend using a group by as well which would look like this:
SELECT SUM(CASE WHEN myCondition THEN 1 ELSE 0 END) as numyear
FROM myTable
GROUP BY SUM(CASE WHEN myCondition THEN 1 ELSE 0 END)

Using the total of a column of the queried table in a case when (Hive)

Simplified example:
In hive, I have a table t with two columns:
Name, Value
Bob, 2
Betty, 4
Robb, 3
I want to do a case when that uses the total of the Value column:
Select
Name
, CASE
When value>0.5*sum(value) over () THEN ‘0’
When value>0.9*sum(value) over () THEN ‘1’
ELSE ‘2’
END as var
From table
I don’t like the fact that sum(value) over () is computed twice. Is there a way to compute this only once. Added twist, I want to do this in one query, so without declaring user variables.
I was thinking of scalar queries:
With total as
(Select sum(value) from table)
Select
Name
, CASE
When value>0.5*(select * from total) THEN ‘0’
When value>0.9*(select * from total)THEN ‘1’
ELSE ‘2’
END as var
From table;
But this doesn’t work.
TLDR: Is there a way to simplify the first query without user variables ?
Don't worry about that. Let the optimizer worry about it. But, you can use a subquery or CTE if you don't want to repeat the expression:
select Name,
(case when value > 0.5 * total then '0'
when value > 0.9 * total then '1'
else '2'
end) as var
From (select t.*, sum(value) over () as total
from table t
) t;
Cross join a subquery that fetches the sum to the table:
Select
t.Name
, CASE
When t.value>0.9*tt.value THEN '1'
When t.value>0.5*tt.value THEN '0'
ELSE '2'
END as var
From table t cross join (select sum(value) value from table) tt
and change the order of the WHEN clauses in the CASE expression because as they are, the 2nd case will never succeed.
Since I/O is the major factor the slows down Hive queries, we should strive to reduce the num of stages to get better performance.
So it's better not to use a sub-query or CTE here.
Try this SQL with a global window clause:
select
name,
case
when value > 0.5*sum(value) over w then '0'
when value > 0.9*sum(value) over w then '1'
else '2'
end as var
from my_table
window w as (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
In this case window clause is the recommended way to reduce repetition of code.
Both the windowing and the sum aggregation will be computed only once. You can run explain select..., confirming that only ONE meaningful MR stage will be launched.
Edit:
1. A simple select clause on a subquery is not sth to worry about. It can be pushed down to the last phase of the subquery, so as to avoid additional MR stage.
2. Two identical aggregations residing in the same query block will only be evaluated once. So don’t worry about potential repeated calculation.

Return NULL instead of 0 when using COUNT(column) SQL Server

I have query which running fine and its doing two types of work, COUNT and SUM.
Something like
select
id,
Count (contracts) as countcontracts,
count(something1),
count(something1),
count(something1),
sum(cost) as sumCost
from
table
group by
id
My problem is: if there is no contract for a given ID, it will return 0 for COUNT and Null for SUM. I want to see null instead of 0
I was thinking about case when Count (contracts) = 0 then null else Count (contracts) end but I don't want to do it this way because I have more than 12 count positions in query and its prepossessing big amount of records so I think it may slow down query performance.
Is there any other ways to replace 0 with NULL?
Try this:
select NULLIF ( Count(something) , 0)
Here are three methods:
1. (case when count(contracts) > 0 then count(contracts) end) as countcontracts
2. sum(case when contracts is not null then 1 end) as countcontracts
3. nullif(count(contracts), 0)
All three of these require writing more complicated expressions. However, this really isn't that difficult. Just copy the line multiple times, and change the name of the variable on each one. Or, take the current query, put it into a spreadsheet and use spreadsheet functions to make the transformation. Then copy the function down. (Spreadsheets are really good code generators for repeated lines of code.)

Filter on a nested aggregate SUM function not working

I have these two tables (the names have been pluralized for the sake of the example):
Table Locations:
idlocation varchar(12)
name varchar(50)
Table Answers:
idlocation varchar(6)
question_number varchar(3)
answer_text1 varchar(300)
answer_text2 varchar(300)
This table can hold answers for multiple locations according a list of numbered questions that repeat on each of them.
What I am trying to do is to add up the values residing in the answer_text1 and answer_text2 columns, for each location available on the Locations table but for only an specific question and then output a value based on the result (1 or 0).
The query goes as follows using a nested table Answers to perform the SUM operation:
select
l.idlocation,
'RESULT' = (
case when (
select
sum(cast(isnull(c.answer_text1,0) as int)) +
sum(cast(isnull(c.answer_text2,0) as int))
from Answers c
where b.idlocation=c.idlocation and c.question_number='05'
) > 0 then
1
else
0
end
)
from Locations l, Answers b
where l.idlocation=b.idlocation and b.question_number='05'
In the table Answers I am saving sometimes a date string type of value for its field answer_text2 but on a different question number.
When I run the query I get the following error:
Conversion failed when converting the varchar value '27/12/2013' to data type int
I do have that value '27/12/2013' on the answer_text2 field but for a different question, so my filter gets ignored on the nested select statement after this: b.idlocation=c.idlocation, and it's adding up apparently more questions hence the error posted.
Update
According to Steve's suggested solution, I ended up implementing the filter to avoid char/varchar considerations into my SUM statement with a little variant:
Every possible not INT string value has a length greater than 2 ('00' to '99' for my question numbers) so I use this filter to determine when I am going to apply the cast.
'RESULT' =
case when (
select sum(
case when len(c.answer_text1) <= 2 then
cast(isnull(c.answer_text1,'0') as int)
else
0
end
) +
sum(
case when len(c.answer_text2) <= 2 then
cast(isnull(c.answer_text2,'0') as int)
else
0
end
)
from Answers c
where c.idlocation=b.idlocation and c.question_number='05'
) > 0
then
1
else
0
end
This is an unfortunate result of how the SQL Server query processor/optimizer works. In my opinion, the SQL standard prohibits the calculation of SELECT list expressions before the rows that will contribute to the result set have been identified, but the optimizer considers plans that violate this prohibition.
What you're observing is an error in the evaluation of a SELECT list item on a row that is not in the result set of your query. While this shouldn't happen, it does, and it's somewhat understandable, because to protect against it in every situation would exclude many efficient query plans from consideration. The vast majority of SELECT expressions will never raise an error, regardless of data.
What you can do is try to protect against this with an additional CASE expression. To protect against strings with the '/' character, for example:
... SUM(CASE WHEN c.answer_text1 IS NOT NULL and c.answer_text1 NOT LIKE '%/%' THEN CAST(c.answer_text1 as int) ELSE 0 END)...
If you're using SQL Server 2012, you have a better option: TRY_CONVERT:
...SUM(COALESCE(TRY_CONVERT(int,c.answer_text1),0)...
In your particular case, the overall database design is flawed, because numeric information should be stored in number-type columns. (This, of course, may not be your fault.) So redesign is an option, putting integer answers in integer-type columns and non-integer answer_text elsewhere. A compromise, if you can't redesign the tables, that I think will work, is to add a persisted computed column with value TRY_CONVERT(int,c.answer_text1) (or its best equivalent, based on what you know about the actual data in the table - perhaps the integer value of only columns containing no non-digit character and having length less than 9).
Your query appears correct enough, which means you have a Question 05 record with a datetime in either the answer_text1 or answer_text2 field.
Give this a shot to figure out which row has a date:
select *
from Answers
where question_number='05'
and (isdate(answer_text1) = 1 or isdate(answer_text2) = 1)
Furthermore, you could filter out any rows that have dates in them
where isdate(c.answer_text1) = 0
and isdate(c.answer_text2) = 0
and ...
Another option similar in nature to Steve's excellent answer is to filter your Answers table with a subquery like so:
select
l.idlocation,
'RESULT' = (
case when (
select
sum(cast(isnull(c.answer_text1,0) as int)) +
sum(cast(isnull(c.answer_text2,0) as int))
from (select answer_text1, answer_text2, idlocation from Answers where question_number ='05') c
where b.idlocation=c.idlocation
) > 0 then
1
else
0
end
)
from Locations l, Answers b
where l.idlocation=b.idlocation and b.question_number='05'
More generally, though, you could just have this query like this
select locations.idlocation, case when sum(case when is_numeric(answer_text1) then answer_text1 else 0 end) + sum(case when is_numeric(answer_text2) then answer_text2 else 0 end) > 0 then 1 else 0 end as RESULT from locations
inner join answers on answers.idlocation = locations.idlocation
where answers.question_number ='05'
group by locations.idlocation
Which would produce the same result.

Is it possible to use the result of a subquery in a case statement of the same outer query?

I am writing a search routine with a ranking algorithm and would like to get this in one pass.
My Ideal query would be something like this....
select *, (select top 1 wordposition
from wordpositions
where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502
) as WordPos,
case when WordPos<11 then 1 else case WordPos<50 then 2 else case WordPos<100 then 3 else 4 end end end end as rank
from items
Is it possible to use WordPos in a case right there? It's generating an error on me , Invalid column name 'WordPos'.
I know I can redo the subquery for each case but I think it would actually re-run the case wouldn't it?
For example:
select *, case when (select top 1 wordposition from wordpositions where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502)<11 then 1 else case (select top 1 wordposition from wordpositions where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502)<50 then 2 else case (select top 1 wordposition from wordpositions where recordid=items.pk_itemid and wordid=79588 and nextwordid=64502)<100 then 3 else 4 end end end end as rank from items
That works....but is it really re-running the identical query each time?
It's hard to tell from the tests as the first time it runs it's slow but subsequent runs are quick....it's caching...so would that mean that the first time it ran it for the first row, the subsequent three times it would get the result from cache?
Just curious what the best way to do this would be...
Thank you!
Ryan
You can do this using a subquery. I will stick with your SQL Server syntax, even though the question is tagged mysql:
select i.*,
(case when WordPos < 11 then 1
when WordPos < 50 then 2
when WordPos < 100 then 3
else 4
end) as rank
from (select i.*,
(select top 1 wpwordposition
from wordpositions wp
where recordid=i.pk_itemid and wordid=79588 and nextwordid=64502
) as WordPos
from items i
) i;
This also simplifies the case statement. You do not need nested case statements to handle multiple conditions, just multiple where clauses.
No. Identifiers introduced in the output clause (the fact that it comes from a sub-query is irrelevant) cannot be used within the same SELECT statement.
Here are some solutions:
Rewrite the query using a JOIN1, This will eliminate the issue entirely and fits well with RA.
Wrap the entire SELECT with the sub-query within another SELECT with the case. The outer select can access identifiers introduced by the inner SELECT's output clause.
Use a CTE (if SQL Server). This is similar to #2 in that it allows an identifier to be introduced.
While "re-writing" the sub-query for each case is very messy it should still result in an equivalent plan - but view the query profile! - as the results of the query are non-volatile. As such the equivalent sub-queries can be safely moved by the query planner which should move the sub-query/sub-queries to a JOIN to avoid any "re-running" in the first place.
1 Here is a conversion to use a JOIN, which is my preferred method. (I find that if a query can't be written in terms of a JOIN "easily" then it might be asking for the wrong thing or otherwise be showing issues with schema design.)
select
wp.wordposition as WordPos,
case wp.wordposition .. as Rank
from items i
left join wordpositions wp
on wp.recordid = i.pk_itemid
where wp.wordid = 79588
and wp.nextwordid = 64502
I've made assumptions about the multiplicity here (i.e. that wordid is unique) which should be verified. If this multiplicity is not valid and not correctable otherwise (and you're indeed using SQL Server), then I'd recommend using ROW_NUMBER() and a CTE.