How to sum in SQL with conditions? - sql

It's more than likely this question has already been asked before and that I could just not find it now knowing for what to search.
Assuming I have a simple table with two columns, one column holding one of two values, e.g. "positive" and "negative", and the other holding an integer.
Is there a way using standard SQL to calculate a sum of all the numbers in the second column whereby the number is added if the field in the first column reads "positive" and vice-versa subtracted for "negative" numbers?
Also, it would be interesting to understand how to do the same with MS Access if it is different from standard SQL.

You case use sum(case):
select sum(case when col1 = '+' then value
when col2 = '-' then - value
end) as overall
from t;
In MS Access, you would use switch or iff
select sum(switch(col1 = '+', value, col2 = '-', - value, 0)
) as overall
from t;

Related

SQL CASE WHEN- can I do a function within a function? New to SQL

SELECT
SP.SITE,
SYS.COMPANY,
SYS.ADDRESS,
SP.CUSTOMER,
SP.STATUS,
DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) AS MONTH_COUNT
CASE WHEN(MONTH_COUNT = 0 THEN MONTH_COUNT = DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES) AS DAY_COUNT)
ELSE NULL
END
FROM SALEPASSES AS SP
INNER JOIN SYSTEM AS SYS ON SYS.SITE = SP.SITE
WHERE STATUS IN (7,27,29);
I am still trying to understand SQL. Is this the right order to have everything? I'm assuming my datediff() is unable to work because it's inside case when. What I am trying to do, is get the day count if month_count is less than 1 (meaning it's less than one month and we need to count the days between the dates instead). I need month_count to run first to see if doing the day_count would even be necessary. Please give me feedback, I'm new and trying to learn!
Case is an expression, it returns a value, it looks like you should be doing this:
DAY_COUNT =
CASE WHEN DATEDIFF(MONTH,SP.MEMBERSINCE, SP.EXPIRES) = 0
THEN DATEDIFF(DAY,SP.MEMBERSINCE, SP.EXPIRES))
ELSE NULL END
You shouldn't actually need else null as NULL is the default.
Note also you [usually] cannot refer to a derived column in the same select
It appears that what you are trying to do is define the MonthCount column's value, and then reuse that value in another column's definition. (The Don't Repeat Yourself principle.)
In most dialects of SQL, you can't do that. Including MS SQL Server.
That's because SQL is a "declarative" language. This means that SQL Server is free to calculate the column values in any order that it likes. In turn, that means you're not allowed to do anything that would rely on one column being calculated before another.
There are two basic ways around that...
First, use CTEs or sub-queries to create two different "scopes", allowing you to define MonthCount before DayCount, and so reuse the value without retyping the definition.
SELECT
*,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
(
SELECT
*,
bar AS MonthCount
FROM
x
)
AS derive_month
The second main way is to somehow derive the value Before the SELECT block is evaluated. In this case, using APPLY to 'join' a single value on to each input row...
SELECT
x.*,
MonthCount,
CASE WHEN MonthCount = 0 THEN foo ELSE NULL END AS DayCount
FROM
x
CROSS APPLY
(
SELECT
bar AS MonthCount
)
AS derive_month

IIF Function returning incorrect calculated values - SQL Server

I am writing a query to show returns of placing each way bets on horse races
There is an issue with the PlaceProfit result - This should show a return if the horses finishing position is between 1-4 and a loss if the position is => 5
It does show the correct return if the horses finishing position is below 9th, but 10th place and above is being counted as a win.
I include my code below along with the output.
ALTER VIEW EachWayBetting
AS
SELECT a.ID,
RaceDate,
runners,
track.NAME AS Track,
horse.NAME as HorseName,
IndustrySP,
Place AS 'FinishingPosition',
-- // calculates returns on the win & place parts of an each way bet with 1/5 place terms //
IIF(A.Place = '1', 1.0 * (A.IndustrySP-1), '-1') AS WinProfit,
IIF(A.Place <='4', 1.0 * (A.IndustrySP-1)/5, '-1') AS PlaceProfit
FROM dbo.NewRaceResult a
LEFT OUTER JOIN track ON track.ID = A.TrackID
LEFT OUTER JOIN horse ON horse.ID = A.HorseID
WHERE a.Runners > 22
This returns:
As I mention in the comments, the problem is your choice of data type for place, it's varchar. The ordering for a string data type is completely different to that of a numerical data type. Strings are sorted by character from left to right, in the order the characters are ordered in the collation you are using. Numerical data types, however, are ordered from the lowest to highest.
This means that, for a numerical data type, the value 2 has a lower value than 10, however, for a varchar the value '2' has a higher value than '10'. For the varchar that's because the ordering is completed on the first character first. '2' has a higher value than '1' and so '2' has a higher value than '10'.
The solution here is simple, fix your design; store numerical data in a numerical data type (int seems appropriate here). You're also breaking Normal Form rules, as you're storing other data in the column; mainly the reason a horse failed to be classified. Such data isn't a "Place" but information on why the horse didn't place, and so should be in a separate column.
You can therefore fix this by firstly adding a new column, then updating it's value to be the values that aren't numerical and making place only contain numerical data, and then finally altering your place column.
ALTER TABLE dbo.YourTable ADD UnClassifiedReason varchar(5) NULL; --Obviously use an appropriate length.
GO
UPDATE dbo.YourTable
SET Place = TRY_CONVERT(int,Place),
UnClassifiedReason = CASE WHEN TRY_CONVERT(int,Place) IS NULL THEN Place END;
GO
ALTER TABLE dbo.YourTable ALTER COLUMN Place int NULL;
GO
If Place does not allow NULL values, you will need to ALTER the column first to allow them.
In addition to fixing the data as Larnu suggests, you should also fix the query:
SELECT nrr.ID, nrr.RaceDate, nrr.runners,
t.NAME AS Track, t.NAME as HorseName, nrr.IndustrySP,
Place AS FinishingPosition,
-- // calculates returns on the win & place parts of an each way bet with 1/5 place terms //
(CASE WHEN nrr.Place = 1 THEN (nrr..IndustrySP - 1.0) ELSE -1 END) AS WinProfit,
(CASE WHEN nrr.Place <= 4 THEN (nrr.IndustrySP - 1.0) / 5 THEN -1 END) AS PlaceProfit
FROM dbo.NewRaceResult nrr LEFT JOIN
track t
ON t.ID = nrr.TrackID LEFT JOIN
horse h
ON h.ID = nrr.HorseID
WHERE nrr.Runners > 22;
The important changes are removing single quotes from numbers and column names. It seems you need to understand the differences among strings, numbers, and identifiers.
Other changes are:
Meaningful table aliases, rather than meaningless letters such as a.
Qualifying all column references, so it is clear where columns are coming from.
Switching from IFF() to CASE. IFF() is bespoke SQL Server; CASE is standard SQL for conditional expressions (both work fine).
Being sure that the types returned by all branches of the conditional expressions are consistent.
Note: This version will work even if you don't change the type of Place. The strings will be converted to numbers in the appropriate places. I don't advocate relying on such silent conversion, so I recommend fixing the data.
If place can have non-numeric values, then you need to convert them:
(CASE WHEN TRY_CONVERT(int, nrr.Place) = 1 THEN (nrr..IndustrySP - 1.0) ELSE -1 END) AS WinProfit,
(CASE WHEN TRY_CONVERT(int, nrr.Place) <= 4 THEN (nrr.IndustrySP - 1.0) / 5 THEN -1 END) AS PlaceProfit
But the important point is to fix the data.

Sql column value as formula in select

Can I select a column based on another column's value being listed as a formula? So I have a table, something like:
column_name formula val
one NULL 1
two NULL 2
three one + two NULL
And I want to do
SELECT
column_name,
CASE WHEN formula IS NULL
val
ELSE
(Here's where I'm confused - How do I evaluate the formula?)
END as result
FROM
table
And end up with a result set like
column_name result
one 1
two 2
three 3
You keep saying column, and column name, but you're actually talking about rows, not columns.
The problem is that you (potentially) want different formulas for each row. For example, row 4 might be (two - one) = 1 or even (three + one) = 4, where you'd have to calculate row three before you could do row 4. This means that a simple select query that parses the formulas is going to be very hard to do, and it would have to be able to handle each type of formula, and even then if the formulas reference other formulas that only makes it harder.
If you have to be able to handle functions like (two + one) * five = 15 and two + one * five = 7, then you'd be basically re-implementing a full blown eval function. You might be better to return the SQL table to another language that has eval functions built in, or you could use something like SQL Eval.net if it has to be in SQL.
Either way, though, you've still got to change "two + one" to "2 + 1" before you can do the eval with it. Because these values are in other rows, you can't see those values in the row you're looking at. To get the value for "one" you have to do something like
Select val from table where column_name = 'one'
And even then if the val is null, that means it hasn't been calculated yet, and you have to come back and try again later.
If I had to do something like this, I would create a temporary table, and load the basic table into it. Then, I'd iterate over the rows with null values, trying to replace column names with the literal values. I'd run the eval over any formulas that had no symbols anymore, setting the val for those rows. If there were still rows with no val (ie they were waiting for another row to be done first), I'd go back and iterate again. At the end, you should have a val for every row, at which point it is a simple query to get your results.
Possible solution would be like this kind....but since you mentioned very few things so this works on your above condition, not sure for anything else.
GO
SELECT
t1.column_name,
CASE WHEN t1.formula IS NULL
t1.val
ELSE
(select sum(t2.val) from table as t2 where t2.formula is not null)
END as result
FROM
table as t1
GO
If this is not working feel free to discuss it further.

Find columns with non zero values in Hive

Let's say you have a database with n columns, from col1 to coln, where n is large.
Would it be possible to find all rows such that at least one column from colk to coln has a non zero value (assuming columns are non-negative numbers, and numbers may be missing)?
You could use Hive's coalesce function (which takes the first non-null value of a series of column inputs) combined with an if statement, something like:
select *
from table
where coalesce(if(col1 > 0, 1, null), if(col2 > 0, 1, null)...) = 1
;
The above will return any row where at least one column specified in the coalesce function returned a value of 1. Let me know if this works for you.
Edit: Another method which is a little cleaner (doesn't require you to list all columns) but less flexible:
select *
from table tb
where sort_array(array(tb.*))[n-1] > 0
;
The above will sort the array ascending so you can check if the largest value in the array is greater than zero and return only those rows.
why so many complicated queries use this;
INSERT INTO DB.1.table_name_1
SELECT * FROM DB.2_table_name_2
WHERE table_name_1.COL_NAME > 0
INSERT OVERWRITE DB.1.table_name_1
SELECT * FROM DB.2_table_name_2
WHERE table_name_1.COL_NAME > 0

Filter on a nested aggregate SUM function not working

I have these two tables (the names have been pluralized for the sake of the example):
Table Locations:
idlocation varchar(12)
name varchar(50)
Table Answers:
idlocation varchar(6)
question_number varchar(3)
answer_text1 varchar(300)
answer_text2 varchar(300)
This table can hold answers for multiple locations according a list of numbered questions that repeat on each of them.
What I am trying to do is to add up the values residing in the answer_text1 and answer_text2 columns, for each location available on the Locations table but for only an specific question and then output a value based on the result (1 or 0).
The query goes as follows using a nested table Answers to perform the SUM operation:
select
l.idlocation,
'RESULT' = (
case when (
select
sum(cast(isnull(c.answer_text1,0) as int)) +
sum(cast(isnull(c.answer_text2,0) as int))
from Answers c
where b.idlocation=c.idlocation and c.question_number='05'
) > 0 then
1
else
0
end
)
from Locations l, Answers b
where l.idlocation=b.idlocation and b.question_number='05'
In the table Answers I am saving sometimes a date string type of value for its field answer_text2 but on a different question number.
When I run the query I get the following error:
Conversion failed when converting the varchar value '27/12/2013' to data type int
I do have that value '27/12/2013' on the answer_text2 field but for a different question, so my filter gets ignored on the nested select statement after this: b.idlocation=c.idlocation, and it's adding up apparently more questions hence the error posted.
Update
According to Steve's suggested solution, I ended up implementing the filter to avoid char/varchar considerations into my SUM statement with a little variant:
Every possible not INT string value has a length greater than 2 ('00' to '99' for my question numbers) so I use this filter to determine when I am going to apply the cast.
'RESULT' =
case when (
select sum(
case when len(c.answer_text1) <= 2 then
cast(isnull(c.answer_text1,'0') as int)
else
0
end
) +
sum(
case when len(c.answer_text2) <= 2 then
cast(isnull(c.answer_text2,'0') as int)
else
0
end
)
from Answers c
where c.idlocation=b.idlocation and c.question_number='05'
) > 0
then
1
else
0
end
This is an unfortunate result of how the SQL Server query processor/optimizer works. In my opinion, the SQL standard prohibits the calculation of SELECT list expressions before the rows that will contribute to the result set have been identified, but the optimizer considers plans that violate this prohibition.
What you're observing is an error in the evaluation of a SELECT list item on a row that is not in the result set of your query. While this shouldn't happen, it does, and it's somewhat understandable, because to protect against it in every situation would exclude many efficient query plans from consideration. The vast majority of SELECT expressions will never raise an error, regardless of data.
What you can do is try to protect against this with an additional CASE expression. To protect against strings with the '/' character, for example:
... SUM(CASE WHEN c.answer_text1 IS NOT NULL and c.answer_text1 NOT LIKE '%/%' THEN CAST(c.answer_text1 as int) ELSE 0 END)...
If you're using SQL Server 2012, you have a better option: TRY_CONVERT:
...SUM(COALESCE(TRY_CONVERT(int,c.answer_text1),0)...
In your particular case, the overall database design is flawed, because numeric information should be stored in number-type columns. (This, of course, may not be your fault.) So redesign is an option, putting integer answers in integer-type columns and non-integer answer_text elsewhere. A compromise, if you can't redesign the tables, that I think will work, is to add a persisted computed column with value TRY_CONVERT(int,c.answer_text1) (or its best equivalent, based on what you know about the actual data in the table - perhaps the integer value of only columns containing no non-digit character and having length less than 9).
Your query appears correct enough, which means you have a Question 05 record with a datetime in either the answer_text1 or answer_text2 field.
Give this a shot to figure out which row has a date:
select *
from Answers
where question_number='05'
and (isdate(answer_text1) = 1 or isdate(answer_text2) = 1)
Furthermore, you could filter out any rows that have dates in them
where isdate(c.answer_text1) = 0
and isdate(c.answer_text2) = 0
and ...
Another option similar in nature to Steve's excellent answer is to filter your Answers table with a subquery like so:
select
l.idlocation,
'RESULT' = (
case when (
select
sum(cast(isnull(c.answer_text1,0) as int)) +
sum(cast(isnull(c.answer_text2,0) as int))
from (select answer_text1, answer_text2, idlocation from Answers where question_number ='05') c
where b.idlocation=c.idlocation
) > 0 then
1
else
0
end
)
from Locations l, Answers b
where l.idlocation=b.idlocation and b.question_number='05'
More generally, though, you could just have this query like this
select locations.idlocation, case when sum(case when is_numeric(answer_text1) then answer_text1 else 0 end) + sum(case when is_numeric(answer_text2) then answer_text2 else 0 end) > 0 then 1 else 0 end as RESULT from locations
inner join answers on answers.idlocation = locations.idlocation
where answers.question_number ='05'
group by locations.idlocation
Which would produce the same result.