Applying the MIN aggregate function to a BIT field - sql

I want to write the following query:
SELECT ..., MIN(SomeBitField), ...
FROM ...
WHERE ...
GROUP BY ...
The problem is, SQL Server does not like it, when I want to calculate the minimum value of a bit field it returns the error Operand data type bit is invalid for min operator.
I could use the following workaround:
SELECT ..., CAST(MIN(CAST(SomeBitField AS INT)) AS BIT), ...
FROM ...
WHERE ...
GROUP BY ...
But, is there something more elegant? (For example, there might be an aggregate function, that I don't know, and that evaluates the logical and of the bit values in a field.)

One option is MIN(SomeBitField+0). It reads well, with less noise (which I would qualify as elegance).
That said, it's more hack-ish than the CASE option. And I don't know anything about speed/efficiency.

Since there are only two options for BIT, just use a case statement:
SELECT CASE WHEN EXISTS (SELECT 1 FROM ....) THEN 1 ELSE 0 END AS 'MinBit'
FROM ...
WHERE ...
This has the advantage of:
Not forcing a table scan (indexes on BIT fields pretty much never get used)
Short circuiting TWICE (once for EXISTS and again for the CASE)
It is a little more code to write but it shouldn't be terrible. If you have multiple values to check you could always encapsulate your larger result set (with all the JOIN and FILTER criteria) in a CTE at the beginning of the query, then reference that in the CASE statements.

This query is the best solution:
SELECT CASE WHEN MIN(BitField+0) = 1 THEN 'True' ELSE 'False' END AS MyColumn
FROM MyTable
When you add the BitField+0 it would automatically becomes like int

select min(convert(int, somebitfield))
or if you want to keep result as bit
select convert(bit, min(convert(int, somebitfield)))

Try the following
Note: Min represent And aggregate function , Max represent Or aggregate function
SELECT ..., MIN(case when SomeBitField=1 then 1 else 0 end), MIN(SomeBitField+0)...
FROM ...
WHERE ...
GROUP BY ...
same result

This small piece of code has always worked with me like a charm:
CONVERT(BIT, MIN(CONVERT(INT, BitField))) as BitField

AVG(CAST(boolean_column AS FLOAT)) OVER(...) AS BOOLEAN_AGGREGATE
Give a fuzzy boolean :
1 indicate that's all True;
0 indicate that's all false;
a value between ]0..1[ indicate partial matching and can be some percentage of truth.

Related

Dealing with null in a case statement on a casted value in big query

I want to group a value which is astring of decimals into named groups.
Example :
CASE WHEN CAST(X as NUMERIC)<1000 THEN "Under1000" ELSE "Over1000" END
As I've got some values missing, I would rather use safe_cast instead of cast and want a specific group for missing values.
I could go for :
CASE WHEN SAFE_CAST(X as NUMERIC) = NULL THEN "MissingData" WHEN SAFE_CAST(X as NUMERIC)<1000 THEN "Under1000" ELSE "Over1000" END
But what annoys me here is that I'm reapeating the safe_cast operation.
Is there a way to avoid that ?
I've been reading following example :
CASE operation(X) WHEN result1 THEN "result1" WHEN result2 THEN "result2" ELSE "other_result" END
But that kind of syntax seems to work only for equality operator in the when statements (ie operation(X) = result1 or operation(X) = result2 etc.).
And here I use inferior (or superior)... So I don't know how to manage that.
I guess there must be a way to avoid that operation repetition but can't figure out how.
Thanks for your help.
This should help you to avoid writing SAFE_CAST() several times:
WITH your_data AS (
SELECT "Bob" as name, "150.19" as weight UNION ALL
SELECT "Tom", "2000.90" UNION ALL
SELECT "Jerry", Null)
, transform as (
SELECT name, CAST(weight as NUMERIC) as weight
FROM your_data
)
SELECT
name,
CASE
WHEN weight IS NULL
THEN "MissingData"
WHEN weight<1000
THEN "Under1000"
ELSE "Over1000"
END as weight_agg
FROM transform
Results:
Not sure there is a syntax-based answer that would help, but one potential way to achieve "cleaner" or non-repetitive code is break up your query into logical chunks using CTEs.
with data as (), -- raw data
casted as (), -- do your safe_cast here
transformed as () - do your case statement here
select * from transformed
It does make for "longer" code, but it also allows for cleaner logic in the transformation stage (your stated goal).

Search Through All Between Values SQL

I have data following data structure..
_ID _BEGIN _END
7003 99210 99217
7003 10225 10324
7003 111111
I want to look through every _BEGIN and _END and return all rows where the input value is between the range of values including the values themselves (i.e. if 10324 is the input, row 2 would be returned)
I have tried this filter but it does not work..
where #theInput between a._BEGIN and a._END
--THIS WORKS
where convert(char(7),'10400') >= convert(char(7),a._BEGIN)
--BUT ADDING THIS BREAKS AND RETURNS NOTHING
AND convert(char(7),'10400') < convert(char(7),a._END)
Less than < and greater than > operators work on xCHAR data types without any syntactical error, but it may go semantically wrong. Look at examples:
1 - SELECT 'ab' BETWEEN 'aa' AND 'ac' # returns TRUE
2 - SELECT '2' BETWEEN '1' AND '10' # returns FALSE
Character 2 as being stored in a xCHAR type has greater value than 1xxxxx
So you should CAST types here. [Exampled on MySQL - For standard compatibility change UNSIGNED to INTEGER]
WHERE CAST(#theInput as UNSIGNED)
BETWEEN CAST(a._BEGIN as UNSIGNED) AND CAST(a._END as UNSIGNED)
You'd better change the types of columns to avoid ambiguity for later use.
This would be the obvious answer...
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #theInput between a._BEGIN and a._END
If the data is string (assuming here as we don't know what DB) You could add this.
Declare #searchArg VARCHAR(30) = CAST(#theInput as VARCHAR(30));
SELECT *
FROM <YOUR_TABLE_NAME> a
WHERE #searchArg between a._BEGIN and a._END
If you care about performance and you've got a lot of data and indexes you won't want to include function calls on the column values.. you could in-line this conversion but this assures that your predicates are Sargable.
SELECT * FROM myTable
WHERE
(CAST(#theInput AS char) >= a._BEGIN AND #theInput < a.END);
I also saw several of the same type of questions:
SQL "between" not inclusive
MySQL "between" clause not inclusive?
When I do queries like this, I usually try one side with the greater/less than on either side and work from there. Maybe that can help. I'm very slow, but I do lots of trial and error.
Or, use Tony's convert.
I supposed you can convert them to anything appropriate for your program, numeric or text.
Also, see here, http://technet.microsoft.com/en-us/library/aa226054%28v=sql.80%29.aspx.
I am not convinced you cannot do your CAST in the SELECT.
Nick, here is a MySQL version from SO, MySQL "between" clause not inclusive?

Avoid division by zero in PostgreSQL

I'd like to perform division in a SELECT clause. When I join some tables and use aggregate function I often have either null or zero values as the dividers. As for now I only come up with this method of avoiding the division by zero and null values.
(CASE(COALESCE(COUNT(column_name),1)) WHEN 0 THEN 1
ELSE (COALESCE(COUNT(column_name),1)) END)
I wonder if there is a better way of doing this?
You can use NULLIF function e.g.
something/NULLIF(column_name,0)
If the value of column_name is 0 - result of entire expression will be NULL
Since count() never returns NULL (unlike other aggregate functions), you only have to catch the 0 case (which is the only problematic case anyway). So, your query simplified:
CASE count(column_name)
WHEN 0 THEN 1
ELSE count(column_name)
END
Or simpler, yet, with NULLIF(), like Yuriy provided.
Quoting the manual about aggregate functions:
It should be noted that except for count, these functions return a
null value when no rows are selected.
I realize this is an old question, but another solution would be to make use of the greatest function:
greatest( count(column_name), 1 ) -- NULL and 0 are valid argument values
Note:
My preference would be to either return a NULL, as in Erwin and Yuriy's answer, or to solve this logically by detecting the value is 0 before the division operation, and returning 0. Otherwise, the data may be misrepresented by using 1.
Another solution avoiding division by zero, replacing to 1
select column + (column = 0)::integer;
If you want the divider to be 1 when the count is zero:
count(column_name) + 1 * (count(column_name) = 0)::integer
The cast from true to integer is 1.

how can I force SQL to only evaluate a join if the value can be converted to an INT?

I've got a query that uses several subqueries. It's about 100 lines, so I'll leave it out. The issue is that I have several rows returned as part of one subquery that need to be joined to an integer value from the main query. Like so:
Select
... columns ...
from
... tables ...
(
select
... column ...
from
... tables ...
INNER JOIN core.Type mt
on m.TypeID = mt.TypeID
where dpt.[DataPointTypeName] = 'TheDataPointType'
and m.TypeID in (100008, 100009, 100738, 100739)
and datediff(d, m.MeasureEntered, GETDATE()) < 365 -- only care about measures from past year
and dp.DataPointValue <> ''
) as subMdp
) as subMeas
on (subMeas.DataPointValue NOT LIKE '%[^0-9]%'
and subMeas.DataPointValue = cast(vcert.IDNumber as varchar(50))) -- THIS LINE
... more tables etc ...
The issue is that if I take out the cast(vcert.IDNumber as varchar(50))) it will attempt to compare a value like 'daffodil' to a number like 3245. Even though the datapoint that contains 'daffodil' is an orphan record that should be filtered out by the INNER JOIN 4 lines above it. It works fine if I try to compare a string to a string but blows up if I try to compare a string to an int -- even though I have a clause in there to only look at things that can be converted to integers: NOT LIKE '%[^0-9]%'. If I specifically filter out the record containing 'daffodil' then it's fine. If I move the NOT LIKE line into the subquery it will still fail. It's like the NOT LIKE is evaluated last no matter what I do.
So the real question is why SQL would be evaluating a JOIN clause before evaluating a WHERE clause contained in a subquery. Also how I can force it to only evaluate the JOIN clause if the value being evaluated is convertible to an INT. Also why it would be evaluating a record that will definitely not be present after an INNER JOIN is applied.
I understand that there's a strong element of query optimizer voodoo going on here. On the other hand I'm telling it to do an INNER JOIN and the optimizer is specifically ignoring it. I'd like to know why.
The problem you are having is discussed in this item of feedback on the connect site.
Whilst logically you might expect the filter to exclude any DataPointValue values that contain any non numeric characters SQL Server appears to be ordering the CAST operation in the execution plan before this filter happens. Hence the error.
Until Denali comes along with its TRY_CONVERT function the way around this is to wrap the usage of the column in a case expression that repeats the same logic as the filter.
So the real question is why SQL would be evaluating a JOIN clause
before evaluating a WHERE clause contained in a subquery.
Because SQL engines are required to behave as if that's what they do. They're required to act like they build a working table from all of the table constructors in the FROM clause; expressions in the WHERE clause are applied to that working table.
Joe Celko wrote about this many times on Usenet. Here's an old version with more details.
First of all,
NOT LIKE '%[^0-9]%'
isn`t work well. Example:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT CASE WHEN #INT LIKE '%[^0-9]%' THEN 1 ELSE 0 END AS Is_Number
Result: 1
But it is not a number!
To check if it is real int value , you should use ISNUMERIC function. Let`s check this:
DECLARE #Int nvarchar(20)= ' 454 54'
SELECT ISNUMERIC(#int) Is_Int
Result:0
Result is correct.
So, instead of
NOT LIKE '%[^0-9]%'
try to change this to
ISNUMERIC(subMeas.DataPointValue)=0
UPDATE
How check if value is integer?
First here:
WHERE ISNUMERIC(str) AND str NOT LIKE '%.%' AND str NOT LIKE '%e%' AND str NOT LIKE '%-%'
Second:
CREATE Function dbo.IsInteger(#Value VarChar(18))
Returns Bit
As
Begin
Return IsNull(
(Select Case When CharIndex('.', #Value) > 0
Then Case When Convert(int, ParseName(#Value, 1)) <> 0
Then 0
Else 1
End
Else 1
End
Where IsNumeric(#Value + 'e0') = 1), 0)
End
Filter out the non-numeric records in a subquery or CTE

What does this SQL Query mean?

I have the following SQL query:
select AuditStatusId
from dbo.ABC_AuditStatus
where coalesce(AuditFrequency, 0) <> 0
I'm struggling a bit to understand it. It looks pretty simple, and I know what the coalesce operator does (more or less), but dont' seem to get the MEANING.
Without knowing anymore information except the query above, what do you think it means?
select AuditStatusId
from dbo.ABC_AuditStatus
where AuditFrequency <> 0 and AuditFrequency is not null
Note that the use of Coalesce means that it will not be possible to use an index properly to satisfy this query.
COALESCE is the ANSI standard function to deal with NULL values, by returning the first non-NULL value based on the comma delimited list. This:
WHERE COALESCE(AuditFrequency, 0) != 0
..means that if the AuditFrequency column is NULL, convert the value to be zero instead. Otherwise, the AuditFrequency value is returned.
Since the comparison is to not return rows where the AuditFrequency column value is zero, rows where AuditFrequency is NULL will also be ignored by the query.
It looks like it's designed to detect a null AuditFrequency as zero and thus hide those rows.
From what I can see, it checks for fields that aren't 0 or null.
I think it is more accurately described by this:
select AuditStatusId
from dbo.ABC_AuditStatus
where (AuditFrequency IS NOT NULL AND AuditFrequency != 0) OR 0 != 0
I'll admit the last part will never do anything and maybe i'm just being pedantic but to me this more accurately describes your query.
The idea is that it is desireable to express a single search condition using a single expression but it's merely style, a question of taste:
One expression:
WHERE age = COALESCE(#parameter_value, age);
Two expressions:
WHERE (
age = #parameter_value
OR
#parameter_value IS NULL
);
Here's another example:
One expression:
WHERE age BETWEEN 18 AND 65;
Two expressions
WHERE (
age >= 18
AND
age <= 65
);
Personally, I have a strong personal perference for single expressions and find them easier to read... if I am familiar with the pattern used ;) Whether they perform differently is another matter...