Difference between not equal and greater than for a number - sql

I have a query:
select *
from randomtable
where randomnumber <> 0
The column "random_number" will never be a negative number.
So, if I write the query as:
select *
from randomtable
where randomnumber > 0
Is there any major difference?

No, there is no difference at all (in your specific situation). All numeric comparisons take the same time.
What's done at the lowest level is that zero is subtracted from randomnumber, and then the result is examined. The > operator looks for a positive non-zero result while the <> operator looks for a non-zero result. Those comparisons are trivial, and take the same amount of time to perform.

If it will never be less than zero, then NO

The important thing is to determine how you know that random_number will never be a negative number. Is there a constraint that guarantees it? If not, what do you want your code to do if a bug somewhere else causes it to be negative?

The result set should never be different. The query path might be as the second might choose an index range scan starting at randomnumber=0 and looking for the records in sequence. As such the order of the results may differ.
If the order of the results is important, then put in an ORDER BY

What does 'greater than' mean? It means 'not equal' AND 'not less than'. In this case, if the number cannot be less than zero, 'greater than' is equivalent to 'not equal'.

Related

Select row with mostly higher value and rarely lower value

I'm trying to select a random row from a table, but there is a column in this table called Rate, I want it to return the row that has a higher rate, and rarely ever return the rows that has a lower rate, is this possible?
Table :
CREATE TABLE _Random (Code varchar(128), Rate tinyint)
So you want a random row, but weighted towards the ones with higher rates?
It would also be good to know how many rows there are in the table - sorting the whole lot is kinda expensive. You may prefer to use a row_number concept than sorting by N guids.
So... One option could be to generate a single number, and then divide 100 by it. Imagine we generate a number between 0 and 1.
.25 gives us 400, .5 gives us 200, .75 gives us 133... Notice that there's a curve here - so the numbers closer to 100 come up more often (subtract 100 to make the range start at 1).
You could use RAND() for a single value between 0 and 1 (it's probably good enough), and then do the division and subtraction to get a number. If this is higher than the count of records, then maybe repeat? But try to choose a value for your division that suits.
If you need to weight it more, you could raise your RAND() value by some number, to flatten it out or steepen it up. Do some experimenting to see how it looks.
This query will fetch a random record which has an above average rate
SELECT TOP (1) * FROM _Random
WHERE Rate>(SELECT AVG(Rate) FROM _Random)
ORDER BY NEWID()

Select case mess in JFreeChart

I have a Column(cliente_x_hora, a numeric field) i put in a interval and count the number in each interval.I have 3 textfields(number of intervals,value between intervals and initial value). When I select the two first(with 5 intervals and 1000 value), the query run flawless and generate the expect barchart.
Query(with two select textfields):
SELECT INTERVAL, COUNT(*) TOTAL FROM (
SELECT CASE WHEN CLIENTE_X_HORA>0 AND CLIENTE_X_HORA<=1000.00 THEN '0<CLIENTE_X_HORA> <=1000.00'
WHEN CLIENTE_X_HORA>1000.00 AND CLIENTE_X_HORA<=2000.00 THEN '1000.00<CLIENTE_X_HORA><=2000.00'
WHEN CLIENTE_X_HORA>2000.00 AND CLIENTE_X_HORA<=3000.00 THEN '2000.00<CLIENTE_X_HORA><=3000.00'
WHEN CLIENTE_X_HORA>3000.00 AND CLIENTE_X_HORA<=4000.00 THEN '3000.00<CLIENTE_X_HORA><=4000.00'
ELSE '4000.00<CLIENTE_X_HORA' END INTERVAL, CLIENTE_X_HORA FROM SGD_CAUSA)
GROUP BY INTERVAL ORDER BY TOTAL
The barchart is
The problem is when I select the last field(initial value with, per example 2000), my barchart go crazy(i believe is adding up the discarded values below 2000):
That ELSE(>6000) should be much smaller than is showing.How can I solve that?
Best Regards,
DDias
CLARIFICATION from OP:
The query is the same as above but begins in 2000:
SELECT CASE WHEN CLIENTE_X_HORA>2000 AND CLIENTE_X_HORA<=3000.00... and ends in 6000:ELSE '6000.00<CLIENTE_X_HORA' END INTERVAL, CLIENTE_X_HORA FROM SGD_CAUSA) GROUP BY INTERVAL ORDER BY TOTAL
put the result in table form is impractical(we are talking about over 87 thousand rows) That happens always when i give an initial value different than ZERO.
Your ELSE is just that. It includes everything that is not matched by specific WHENs.
So if you do not start from zero, that last column will include everything below a lowest limit in addition to greater than highest limit.
So if you do not want this behavior, do not use ELSE at all. Use WHEN CLIENTE_X_HORA > 6000.00 (or whatever your highest limit is) as the last condition.
EDIT:
In your internal query filter out (with WHERE) the values that are below the lowest limit.
Since we no longer have unneeded low range, you no longer need the HAVING clause we added and you can even go back to using ELSE.
If your lowest limit is zero, then you will be filtering everything below 0, which I assume is nothing.

SQL and logical operators and null checks

I've got a vague, possibly cargo-cult memory from years of working with SQL Server that when you've got a possibly-null column, it's not safe to write "WHERE" clause predicates like:
... WHERE the_column IS NULL OR the_column < 10 ...
It had something to do with the fact that SQL rules don't stipulate short-circuiting (and in fact that's kind-of a bad idea possibly for query optimization reasons), and thus the "<" comparison (or whatever) could be evaluated even if the column value is null. Now, exactly why that'd be a terrible thing, I don't know, but I recall being sternly warned by some documentation to always code that as a "CASE" clause:
... WHERE 1 = CASE WHEN the_column IS NULL THEN 1 WHEN the_column < 10 THEN 1 ELSE 0 END ...
(the goofy "1 = " part is because SQL Server doesn't/didn't have first-class booleans, or at least I thought it didn't.)
So my questions here are:
Is that really true for SQL Server (or perhaps back-rev SQL Server 2000 or 2005) or am I just nuts?
If so, does the same caveat apply to PostgreSQL? (8.4 if it matters)
What exactly is the issue? Does it have to do with how indexes work or something?
My grounding in SQL is pretty weak.
I don't know SQL Server so I can't speak to that.
Given an expression a L b for some logical operator L, there is no guarantee that a will be evaluated before or after b or even that both a and b will be evaluated:
Expression Evaluation Rules
The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.
Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all.
[...]
Note that this is not the same as the left-to-right "short-circuiting" of Boolean operators that is found in some programming languages.
As a consequence, it is unwise to use functions with side effects as part of complex expressions. It is particularly dangerous to rely on side effects or evaluation order in WHERE and HAVING clauses, since those clauses are extensively reprocessed as part of developing an execution plan.
As far as an expression of the form:
the_column IS NULL OR the_column < 10
is concerned, there's nothing to worry about since NULL < n is NULL for all n, even NULL < NULL evaluates to NULL; furthermore, NULL isn't true so
null is null or null < 10
is just a complicated way of saying true or null and that's true regardless of which sub-expression is evaluated first.
The whole "use a CASE" sounds mostly like cargo-cult SQL to me. However, like most cargo-cultism, there is a kernel a truth buried under the cargo; just below my first excerpt from the PostgreSQL manual, you will find this:
When it is essential to force evaluation order, a CASE construct (see Section 9.16) can be used. For example, this is an untrustworthy way of trying to avoid division by zero in a WHERE clause:
SELECT ... WHERE x > 0 AND y/x > 1.5;
But this is safe:
SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;
So, if you need to guard against a condition that will raise an exception or have other side effects, then you should use a CASE to control the order of evaluation as a CASE is evaluated in order:
Each condition is an expression that returns a boolean result. If the condition's result is true, the value of the CASE expression is the result that follows the condition, and the remainder of the CASE expression is not processed. If the condition's result is not true, any subsequent WHEN clauses are examined in the same manner.
So given this:
case when A then Ra
when B then Rb
when C then Rc
...
A is guaranteed to be evaluated before B, B before C, etc. and evaluation stops as soon as one of the conditions evaluates to a true value.
In summary, a CASE short-circuits buts neither AND nor OR short-circuit so you only need to use a CASE when you need to protect against side effects.
Instead of
the_column IS NULL OR the_column < 10
I'd do
isnull(the_column,0) < 10
or for the first example
WHERE 1 = CASE WHEN isnull(the_column,0) < 10 THEN 1 ELSE 0 END ...
I've never heard of such a problem, and this bit of SQL Server 2000 documentation uses WHERE advance < $5000 OR advance IS NULL in an example, so it must not have been a very stern rule. My only concern with OR is that it has lower precedence than AND, so you might accidentally write something like WHERE the_column IS NULL OR the_column < 10 AND the_other_column > 20 when that's not what you mean; but the usual solution is parentheses rather than a big CASE expression.
I think that in most RDBMSes, indices don't include null values, so an index on the_column wouldn't be terribly useful for this query; but even if that weren't the case, I don't see why a big CASE expression would be any more index-friendly.
(Of course, it's hard to prove a negative, and maybe someone else will know what you're referring to?)
Well, I've repeatedly written queries like the first example since about forever (heck, I've written query generators that generate queries like that), and I've never had a problem.
I think you may be remembering some admonishment somebody gave you sometime against writing funky join conditions that use OR. In your first example, the conditions joined by the OR restrict the same one column of the same table, which is OK. If your second condition was a join condition (i.e., it restricted columns from two different tables), then you could get into bad situations where the query planner just has no choice but to use a Cartesian join (bad, bad, bad!!!).
I don't think your CASE function is really doing anything there, except perhaps hamper your query planner's attempts at finding a good execution plan for the query.
But more generally, just write the straightforward query first and see how it performs for realistic data. No need to worry about a problem that might not even exist!
Nulls can be confusing. The " ... WHERE 1 = CASE ... " is useful if you are trying to pass a Null OR a Value as a parameter ex. "WHERE the_column = #parameter. This post may be helpful Passing Null using OLEDB .
Another example where CASE is useful is when using date functions on the varchar columns. adding ISDATE before using say convert(colA,datetime) might not work, and when colA has non-date data the query can error out.

What does "WHERE id <> 0" clause mean in SQL?

Query: SELECT id, name, FROM users u WHERE **id <> 0** LIMIT 50 OFFSET 0
What does the clause id <> 0 mean here? Does it mean:
id is less than zero or id is greater than zero
<> means "not equal" (it can also be written as != with some DBMS)
It means not equal and apparently I have to submit at least 30 characters for my answer.
It means, "where ID is different from 0".
So, both greater than or less than 0.
it means that you are fetching all records with an id different than 0 (zero), I've sometimes used this just to check if some record is already saved (if the record has an ID it means is saved).
It means only include results who's field id has a value of greater or smaller than 0, basically records that have non-zero id's - but really this should not be possible, if it is then I'd recommend reconsidering your table designs.

Netezza does not do lazy evaluation of case statements?

I'm performing a computation which might contain division by 0, in which case I want the result to be an arbitrary value (55). To my surprise, wrapping the computation with a case statement did not do the job!
select case when 1=0 then 3/0 else 55 end
ERROR HY000: Divide by 0
Why is that? Is there another workaround?
ok, I was being inaccurate. This is the exact query that fails with "divide by 0":
select case when min(baba) = 0 then 55 else sum(1/baba) end from t group by baba
This looks like a lazy evaluation failure out of Netezza, as notice that I group by baba, so whenever baba is 0, it also means that min(baba) is 0, and the evaluation should have been gracefully stopped without ever getting to the 1/baba term and failing on division by 0. Right? well, no.
What I guess is the gotcha here and the reason for the failure is that Netezza evaluates the rows terms before it can evaluate the aggregate terms. So it must evaluate 1/baba and baba at every row, and only then can it evaluate the aggregate terms min(baba) and sum(1/baba)
so, the workaround (for me) was: select case when min(baba) = 0 then 55 else 1/min(baba) end from t group by baba, which has the same meaning.