Filter on a nested aggregate SUM function not working - sql

I have these two tables (the names have been pluralized for the sake of the example):
Table Locations:
idlocation varchar(12)
name varchar(50)
Table Answers:
idlocation varchar(6)
question_number varchar(3)
answer_text1 varchar(300)
answer_text2 varchar(300)
This table can hold answers for multiple locations according a list of numbered questions that repeat on each of them.
What I am trying to do is to add up the values residing in the answer_text1 and answer_text2 columns, for each location available on the Locations table but for only an specific question and then output a value based on the result (1 or 0).
The query goes as follows using a nested table Answers to perform the SUM operation:
select
l.idlocation,
'RESULT' = (
case when (
select
sum(cast(isnull(c.answer_text1,0) as int)) +
sum(cast(isnull(c.answer_text2,0) as int))
from Answers c
where b.idlocation=c.idlocation and c.question_number='05'
) > 0 then
1
else
0
end
)
from Locations l, Answers b
where l.idlocation=b.idlocation and b.question_number='05'
In the table Answers I am saving sometimes a date string type of value for its field answer_text2 but on a different question number.
When I run the query I get the following error:
Conversion failed when converting the varchar value '27/12/2013' to data type int
I do have that value '27/12/2013' on the answer_text2 field but for a different question, so my filter gets ignored on the nested select statement after this: b.idlocation=c.idlocation, and it's adding up apparently more questions hence the error posted.
Update
According to Steve's suggested solution, I ended up implementing the filter to avoid char/varchar considerations into my SUM statement with a little variant:
Every possible not INT string value has a length greater than 2 ('00' to '99' for my question numbers) so I use this filter to determine when I am going to apply the cast.
'RESULT' =
case when (
select sum(
case when len(c.answer_text1) <= 2 then
cast(isnull(c.answer_text1,'0') as int)
else
0
end
) +
sum(
case when len(c.answer_text2) <= 2 then
cast(isnull(c.answer_text2,'0') as int)
else
0
end
)
from Answers c
where c.idlocation=b.idlocation and c.question_number='05'
) > 0
then
1
else
0
end

This is an unfortunate result of how the SQL Server query processor/optimizer works. In my opinion, the SQL standard prohibits the calculation of SELECT list expressions before the rows that will contribute to the result set have been identified, but the optimizer considers plans that violate this prohibition.
What you're observing is an error in the evaluation of a SELECT list item on a row that is not in the result set of your query. While this shouldn't happen, it does, and it's somewhat understandable, because to protect against it in every situation would exclude many efficient query plans from consideration. The vast majority of SELECT expressions will never raise an error, regardless of data.
What you can do is try to protect against this with an additional CASE expression. To protect against strings with the '/' character, for example:
... SUM(CASE WHEN c.answer_text1 IS NOT NULL and c.answer_text1 NOT LIKE '%/%' THEN CAST(c.answer_text1 as int) ELSE 0 END)...
If you're using SQL Server 2012, you have a better option: TRY_CONVERT:
...SUM(COALESCE(TRY_CONVERT(int,c.answer_text1),0)...
In your particular case, the overall database design is flawed, because numeric information should be stored in number-type columns. (This, of course, may not be your fault.) So redesign is an option, putting integer answers in integer-type columns and non-integer answer_text elsewhere. A compromise, if you can't redesign the tables, that I think will work, is to add a persisted computed column with value TRY_CONVERT(int,c.answer_text1) (or its best equivalent, based on what you know about the actual data in the table - perhaps the integer value of only columns containing no non-digit character and having length less than 9).

Your query appears correct enough, which means you have a Question 05 record with a datetime in either the answer_text1 or answer_text2 field.
Give this a shot to figure out which row has a date:
select *
from Answers
where question_number='05'
and (isdate(answer_text1) = 1 or isdate(answer_text2) = 1)
Furthermore, you could filter out any rows that have dates in them
where isdate(c.answer_text1) = 0
and isdate(c.answer_text2) = 0
and ...

Another option similar in nature to Steve's excellent answer is to filter your Answers table with a subquery like so:
select
l.idlocation,
'RESULT' = (
case when (
select
sum(cast(isnull(c.answer_text1,0) as int)) +
sum(cast(isnull(c.answer_text2,0) as int))
from (select answer_text1, answer_text2, idlocation from Answers where question_number ='05') c
where b.idlocation=c.idlocation
) > 0 then
1
else
0
end
)
from Locations l, Answers b
where l.idlocation=b.idlocation and b.question_number='05'
More generally, though, you could just have this query like this
select locations.idlocation, case when sum(case when is_numeric(answer_text1) then answer_text1 else 0 end) + sum(case when is_numeric(answer_text2) then answer_text2 else 0 end) > 0 then 1 else 0 end as RESULT from locations
inner join answers on answers.idlocation = locations.idlocation
where answers.question_number ='05'
group by locations.idlocation
Which would produce the same result.

Related

SQL CASE WHEN- can I do a function within a function? New to SQL

SELECT
SP.SITE,
SYS.COMPANY,
SYS.ADDRESS,
SP.CUSTOMER,
SP.STATUS,
DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) AS MONTH_COUNT
CASE WHEN(MONTH_COUNT = 0 THEN MONTH_COUNT = DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES) AS DAY_COUNT)
ELSE NULL
END
FROM SALEPASSES AS SP
INNER JOIN SYSTEM AS SYS ON SYS.SITE = SP.SITE
WHERE STATUS IN (7,27,29);
I am still trying to understand SQL. Is this the right order to have everything? I'm assuming my datediff() is unable to work because it's inside case when. What I am trying to do, is get the day count if month_count is less than 1 (meaning it's less than one month and we need to count the days between the dates instead). I need month_count to run first to see if doing the day_count would even be necessary. Please give me feedback, I'm new and trying to learn!
Case is an expression, it returns a value, it looks like you should be doing this:
DAY_COUNT =
CASE WHEN DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) = 0
THEN DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES))
ELSE NULL END
You shouldn't actually need else null as NULL is the default.
Note also you [usually] cannot refer to a derived column in the same select
It appears that what you are trying to do is define the MonthCount column's value, and then reuse that value in another column's definition. (The Don't Repeat Yourself principle.)
In most dialects of SQL, you can't do that. Including MS SQL Server.
That's because SQL is a "declarative" language. This means that SQL Server is free to calculate the column values in any order that it likes. In turn, that means you're not allowed to do anything that would rely on one column being calculated before another.
There are two basic ways around that...
First, use CTEs or sub-queries to create two different "scopes", allowing you to define MonthCount before DayCount, and so reuse the value without retyping the definition.
SELECT
*,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
(
SELECT
*,
bar AS MonthCount
FROM
x
)
AS derive_month
The second main way is to somehow derive the value Before the SELECT block is evaluated. In this case, using APPLY to 'join' a single value on to each input row...
SELECT
x.*,
MonthCount,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
x
CROSS APPLY
(
SELECT
bar AS MonthCount
)
AS derive_month

PostgreSQL - Handling empty query result

I am quite new to SQL and I am currently working on some survey results with PostgreSQL. I need to calculate percentages of each option from 5-point scale for all survey questions. I have a table with respondentid, questionid, question response value. Demographic info needed for filtering datacut is retrieved from another table. Then query is passed to result table. All queries texts for specific datacuts are generated by VBA script.
It works OK in general, however there's one problematic case - when there are no respondents for specific cut and I receive empty table as query result. If respondent count is greater than 0 but lower than calculation threshold (5 respondents) I am getting table full of NULLs which is OK. For 0 respondents I get 0 rows as result and nothing is passed to result table and it causes some displacement in final table. I am able to track such cuts as I am also calculating respondent number for overall datacut and storing it in another table. But is there anything I can do at this point - generate somehow table full of NULLs which could be inserted into result table when needed?
Thanks in advance and sorry for clumsiness in code.
WITH ItemScores AS (
SELECT
rsp.questionid,
CASE WHEN SUM(CASE WHEN rsp.respvalue >= 0 THEN 1 ELSE 0 END) < 5 THEN
NULL
ELSE
ROUND(SUM(CASE WHEN rsp.respvalue = 5 THEN 1 ELSE 0 END)/CAST(SUM(CASE
WHEN rsp.respvalue >= 0 THEN 1 ELSE 0 END) AS DECIMAL),2)
END AS 5spercentage,
... and so on for frequencies of 1s,2s,3s and 4s
SUM(CASE WHEN rsp.respvalue >= 0 THEN 1 ELSE 0 END) AS QuestionTotalAnswers
FROM (
some filtering applied here [...]
) AS rsp
GROUP BY rsp.questionid
ORDER BY rsp.questionid;
INSERT INTO results_items SELECT * from ItemScores;
If you want to ensure that the questionid column won't be empty, then you must call a cte with its plain values and then left join with the table that actually you are using to make the aggregations, calcs etc. So it will generate for sure the first list and then join its values.
The example of its concept would be something like:
with calcs as (
select questionid, sum(respvalue) as sum_per_question
from rsp
group by questionid)
select distinct rsp.questionid, calcs.sum_per_question
from rsp
left join calcs on rsp.questionid = calcs.questionid

Using Case to sum NULL instances gives missing expression error

I'm attempting to generate a list of vehicles that don't have a price or mileage listed using the below query. When I attempt to run the query, I get an error "ORA-00936: missing expression", but can't seem to find out why. From other posts here, I can see that using IS NULL should be the appropriate term for the WHEN portion, but I am not seeing anything wrong with the query itself. Any help would be appreciated!
Select
SUM(CASE vehicles.mileage WHEN IS NULL THEN 1 ELSE 0 END) NO_MILEAGE,
SUM(CASE vehicles.price WHEN IS NULL THEN 1 ELSE 0 END) NO_PRICE
From
[data]
Simple syntax error:
Select
SUM(CASE WHEN vehicles.mileage IS NULL THEN 1 ELSE 0 END) NO_MILEAGE,
SUM(CASE WHEN vehicles.price IS NULL THEN 1 ELSE 0 END) NO_PRICE
From
[data];
This is assuming a table named vehicles in your FROM clause or a columns with an object or nested table type in [data] named vehicles. Else the qualification vehicles. would not make sense.
Use a "searched" CASE for a decision between two alternatives.
Details about "simple" and "searched" CASE in the Oracle online reference.
You can also use COUNT for your particular case. The online reference again:
If you specify expr, then COUNT returns the number of rows where expr is not null.
If you specify the asterisk (*), then this function returns all rows,
including duplicates and nulls. COUNT never returns null.
So you need the difference:
Select
COUNT(*) - COUNT(vehicles.mileage) AS NO_MILEAGE,
COUNT(*) - COUNT(vehicles.price) AS NO_PRICE
From
[data];
You could also use Oracle's NVL2 function:
Select
SUM(NVL2(vehicles.mileage, 0, 1)) NO_MILEAGE,
SUM(NVL2(vehicles.price, 0, 1)) NO_PRICE
From
[data]

Return NULL instead of 0 when using COUNT(column) SQL Server

I have query which running fine and its doing two types of work, COUNT and SUM.
Something like
select
id,
Count (contracts) as countcontracts,
count(something1),
count(something1),
count(something1),
sum(cost) as sumCost
from
table
group by
id
My problem is: if there is no contract for a given ID, it will return 0 for COUNT and Null for SUM. I want to see null instead of 0
I was thinking about case when Count (contracts) = 0 then null else Count (contracts) end but I don't want to do it this way because I have more than 12 count positions in query and its prepossessing big amount of records so I think it may slow down query performance.
Is there any other ways to replace 0 with NULL?
Try this:
select NULLIF ( Count(something) , 0)
Here are three methods:
1. (case when count(contracts) > 0 then count(contracts) end) as countcontracts
2. sum(case when contracts is not null then 1 end) as countcontracts
3. nullif(count(contracts), 0)
All three of these require writing more complicated expressions. However, this really isn't that difficult. Just copy the line multiple times, and change the name of the variable on each one. Or, take the current query, put it into a spreadsheet and use spreadsheet functions to make the transformation. Then copy the function down. (Spreadsheets are really good code generators for repeated lines of code.)

SQL select statement filtering

Ok, so I'm trying to select an amount of rows from a column that holds the value 3, but only if there are no rows containing 10 or 4, if there are rows containing 10 or 4 I only want to show those.
What would be a good syntax to do that? So far I've been attempting a CASE WHEN statement, but I can't seem to figure it out.
Any help would be greatly appreciated.
(My database is in an MS SQL 2008 server)
Use a union all:
select
// columns
from YourTable
where YourColumn = 3 and not exists (
select 1 from YourTable where YourColumn = 10 or YourColumn = 4)
union all
select
// columns
from YourTable
where YourColumn = 10 or YourColumn = 4
FYI: Orginal question title was "SQL CASE WHEN NULL - question"
CASE WHEN YourColumn IS NULL THEN x ELSE y END
Since there is nothing that compares to NULL and returns true (not even NULL itself), you cant't do
CASE YourColumn WHEN NULL THEN x ELSE y END
only
CASE ISNULL(YourColumn, '') WHEN '' THEN x ELSE y END
but then you lose the ability to differentiate between NULL and the (in this example) empty string.
Depending on the size of your table and its indexes, it may be more efficient to calculate which values you want before the query
declare #UseThree as bit = 1;
if exists (select 1 from testtable where rowval in (10,4))
set #UseThree = 0;
select COUNT(*)
from testtable
where (#UseThree = 1 AND rowval=3)
OR
(#UseThree = 0 AND rowval in (10,4))
The simplest solution would be to do this in two queries:
SELECT ... FROM YourTable WHERE SomeColumn IN (10,4)
If and only if the above query yields no results, then run the second query:
SELECT ... FROM YourTable WHERE SomeColumn = 3
Running two queries may seem "inelegant" but it has advantages:
It's easy to code
It's easy to debug
It often has better performance than a very complex solution
It's easy to understand for a programmer who has to maintain the code after you.
Running two queries may seem like it has extra overhead, but also consider that you won't run the second query every time -- only if the first query has an empty result. If you use an expensive single-query solution, remember that it will incur that expense every time.