SQL comparison operators - sql

are these two statements the same?
query 1: WHERE salary > 999;
query 2: WHERE salary >= 1000;
I thought they were, but apparently according to my peers they are not (Though they have failed to explain why).

That's not necessarily the same. If you're storing doubles, when you do:
WHERE salary >= 1000;
you're not taking in count all values that are between 999 and 1000 (eg. 999.50)
Otherwise, if you're working with integers, that's true, not only in programming, but also in maths.
n > k <=> n >= k+1

The two queries will give the same results, if salary is an integral type. But if salary is some real type, then the results will be different.

This is entirely dependant on the datatype of salary.
If salary is an INT or a BIGINT then yes they will yield the same results.
If salary is pretty much any other datatype, the first once will return results for 999.9, but the second one won't.

Related

How to limit the character amount you are selecting from

I'm trying to average a field and it's very simple to do but there are some problems with some values. There are values I know are way too big and I was hoping to exclude them by the number of characters (I would probably put 4 characters max).
I'm unfamiliar with a sql clause that could execute this. If there is one that would be great.
select avg(convert(float,duration)) as averageduration
from AsteriskCalls where ISNUMERIC(duration) = 1
I expect the output to be around 500-1000 but it comes up as an 8 digit number.
That is easy enough:
select avg(convert(float,duration)) as averageduration
from AsteriskCalls
where ISNUMERIC(duration) = 1 and length(duration) <= 4;
This will not necessarily work, of course, because you could have '1E30', which would be a pretty big number. And it would miss '0.001', which is a pretty small number.
The more accurate method uses try_convert():
select avg(try_convert(float, duration)) as averageduration
from AsteriskCalls
where try_convert(float, duration) <= 1000.0
And that should probably really be:
where abs(try_convert(float, duration)) <= 1000.0

Oracle SQL Developer Create View Assignment

I've got an assignment with the following instructions:
Create a view named A11T1 (that's A-One-One-T-One, not A-L-L-T-L) that will display the concatenated name, JobTitle and Salary of the people who have a Cat value of N and whose salary is at least 30 percent higher than the average salary of all people who have a Cat value of N. The three column headings should be Name, JobTitle and Salary. The rows should be sorted in traditional phonebook order.
Note 1: As always, concatenated names must appear with one space between the first and last names.
Note 2: The concatenated names and job titles must be displayed in proper case (e.g., Mary Ellen Smith, Assistant Manager) for this task.
Note 3: Remember, the Person11 data is messy. Be sure to look for N and n when you are identifying the people with a Cat value of N.
What I have so far is:
CREATE VIEW A11T1 AS
SELECT INITCAP(FNAME||' '||LNAME) AS "Name", INITCAP(JobTitle), Salary
FROM PERSON11
WHERE UPPER(CAT) = 'N'
GROUP by INITCAP(FNAME||' '||LNAME), INITCAP(JobTitle), Salary
HAVING SALARY >= 1.3 * ROUND(AVG(SALARY),0)
Order by LNAME, FNAME
Error at Command Line:7 Column:10 Error report: SQL Error: ORA-00979: not a GROUP BY expression 00979. 00000 - "not a GROUP BY expression"
Is the current error I'm getting
No matter how much I edit my code it just won't create into a view and I've been stuck on this for hours! I appreciate any responses, even a point in the right direction.
Why do you need to "group by" concatenated name, job title and salary? Do you have more than one row per name?
Perhaps it's because you need to compute the average salary and that requires aggregation? You can't do everything in a single SELECT statement in SQL (at least not with simple tools - you seem to be in the early stages of learning and not looking to use window functions).
The "avg salary" needs to come from a subquery. Where you have >= 1.3 * round(...) you should have instead:
... >= 1.3 * (select avg(salary) from person11 where cat = 'N')
Note that the subquery must be enclosed in parentheses. In your code I see you use upper(cat) - is there a concern that cat may be upper or lower case? In that case it may be better to write
cat in ('n', 'N')
Avoid wrapping column values inside functions whenever possible (that often leads to worse performance). Also, I see no need to round the average salary in your requirements - and in any case, what's the point to rounding to zero decimal places if you then multiply by 1.3? Rounding may actually lead to incorrect output.
EDIT: Sorry, to clarify: I think you are well on your way already. Use the subquery for the average salary, remove the group by (which doesn't hurt anything but is really unneeded), and if you care to, change the upper(cat) as I suggested; I think your query will work with these changes.
Good luck!
I think the easiest way uses window functions:
CREATE VIEW A11T1 AS
SELECT INITCAP(FNAME || ' '|| LNAME) AS Name, INITCAP(JobTitle), Salary
FROM (SELECT p.*, AVG(SALARY) OVER () as avg_salary
FROM FROM PERSON11 p
WHERE UPPER(CAT) = 'N'
) p
WHERE SALARY >= 1.3 * avg_salary
ORDER BY LNAME, FNAME ;

SQL Server Floating Point in WHERE clause

I'm trying to query a database, I need to get a list of customers where their weight is equal to 60.5. The problem is that 60.5 is a real I've never query a database with a real in a where clause before.
I've tried this:
SELECT Name FROM Customers WHERE Weight=60.5
SELECT Name FROM Customers WHERE Weight=cast(60.5 as real)
SELECT Name FROM Customers WHERE Weight=cast(60.5 as decimal)
SELECT Name FROM Customers WHERE Weight=convert(real,'60.5')
SELECT Name FROM Customers WHERE Weight=convert(decimal,'60.5')
These queries return 0 values but in the Customers table their are 10 rows with Weight=60.5
Your problem is that floating point numbers are inaccurate by definition. Comparing what seems to be 60.5 to a literal 60.5 might not work as you've noticed.
A typical solution is to measure the difference between 2 values, and if it's smaller then some predefined epsilon, consider them equal:
SELECT Name FROM Customers WHERE ABS(Weight-60.5) < 0.001
For better performance, you should actually use:
SELECT Name FROM Customers WHERE Weight BETWEEN 64.999 AND 65.001
If you need equality comparison, you should change the type of the column to DECIMAL. Decimal numbers are stored and compared exactly, while real and float numbers are approximations.
#Amit's answer will work, but it will perform quite poorly in comparison to my approach. ABS(Weight-60.5) < 0.001 is unable to use index seeks. But if you convert the column to DECIMAL, then Weight=60.5 will perform well and use index seeks.

ALL operator VS Any on an empty query

I'm reading the oracle documentation on the ANY and ALL operators. I pretty understand their uses except for one thing. It states:
ALL :
If a subquery returns zero rows, the condition evaluates to TRUE.
ANY :
If a subquery returns zero rows, the condition evaluates to FALSE.
It doesn't seem very logical to me. Why would ALL on an empty subquery would return TRUE but ANY returns FALSE?
I'm relatively new to SQL, so I assume it would have a use-case for this behavior which is really counter-intuitive to me.
ANY and ALL on an empty set should return the same value no?
Consider the example of the EMP table in that link.
Specifically this query -
SELECT e1.empno, e1.sal
FROM emp e1
WHERE e1.sal > ANY (SELECT e2.sal
FROM emp e2
WHERE e2.deptno = 20);
In case of ANY, the question you are asking is "Is my salary greater than anyone in department 20 (at least 1 person)". This means you are hoping at least one person has a salary less than you. When there are no rows, this returns FALSE because there is nobody whose salary is less than you, you were hoping for at least one.
In case of ALL, the obvious question you would be asking is "Is my salary greater than everyone?". Rephrasing that as "Is there nobody that has salary greater than me?" When there are no rows returned, your answer is TRUE, because "there is indeed nobody whose salary is greater than me.
Because ANY is to be interpreted as EXIST (if there is any, it means they exist). Therefore, it return false if no rows are found.
All does not certify that any values exist, it just certifies that it represents all possible values. Therefore, it return true even if no rows are found.

Why use the BETWEEN operator when we can do without it?

As seen below the two queries, we find that they both work well. Then I am confused why should we ever use BETWEEN because I have found that BETWEEN behaves differently in different databases as found in w3school
SELECT *
FROM employees
WHERE salary BETWEEN 5000 AND 15000;
SELECT *
FROM employees
WHERE salary >= 5000
AND salary <= 15000;
BETWEEN can help to avoid unnecessary reevaluation of the expression:
SELECT AVG(RAND(20091225) BETWEEN 0.2 AND 0.4)
FROM t_source;
---
0.1998
SELECT AVG(RAND(20091225) >= 0.2 AND RAND(20091225) <= 0.4)
FROM t_source;
---
0.3199
t_source is just a dummy table with 1,000,000 records.
Of course this can be worked around using a subquery, but in MySQL it's less efficient.
And of course, BETWEEN is more readable. It takes 3 times to use it in a query to remember the syntax forever.
In SQL Server and MySQL, LIKE against a constant with non-leading '%' is also a shorthand for a pair of >= and <:
SET SHOWPLAN_TEXT ON
GO
SELECT *
FROM master
WHERE name LIKE 'string%'
GO
SET SHOWPLAN_TEXT OFF
GO
|--Index Seek(OBJECT:([test].[dbo].[master].[ix_name_desc]), SEEK:([test].[dbo].[master].[name] < 'strinH' AND [test].[dbo].[master].[name] >= 'string'), WHERE:([test].[dbo].[master].[name] like 'string%') ORDERED FORWARD)
However, LIKE syntax is more legible.
Using BETWEEN has extra merits when the expression that is compared is a complex calculation rather than just a simple column; it saves writing out that complex expression twice.
BETWEEN in T-SQL supports NOT operator, so you can use constructions like
WHERE salary not between 5000 AND 15000;
In my opinion it's more clear for a human then
WHERE salary < 5000 OR salary > 15000;
And finally if you type column name just one time it gives you less chances to make a mistake
The version with "between" is easier to read. If I were to use the second version I'd probably write it as
5000 <= salary and salary <= 15000
for the same reason.
I vote #Quassnoi - correctness is a big win.
I usually find literals more useful than the syntax symbols like <, <=, >, >=, != etc. Yes, we need (better, accurate) results. And at least I get rid of probabilities of mis-interpreting and reverting meanings of the symbols visually. If you use <= and sense logically incorrect output coming from your select query, you may wander some time and only arrive to the conclusion that you did write <= in place of >= [visual mis-interpretation?]. Hope I am clear.
And aren't we shortening the code (along with making it more higher-level-looking), which means more concise and easy to maintain?
SELECT *
FROM emplyees
WHERE salary between 5000 AND 15000;
SELECT *
FROM emplyees
WHERE salary >= 5000 AND salary <= 15000;
First query uses only 10 words and second uses 12!
Personally, I wouldn't use BETWEEN, simply because there seems no clear definition of whether it should include, or exclude, the values which serve to bound the condition, in your given example:
SELECT *
FROM emplyees
WHERE salary between 5000 AND 15000;
The range could include the 5000 and 15000, or it could exclude them.
Syntactically I think it should exclude them, since the values themselves are not between the given numbers. But my opinion is precisely that, whereas using operators such as >= is very specific. And less likely to change between databases, or between incremements/versions of the same.
Edited in response to Pavel and Jonathan's comments.
As noted by Pavel, ANSI SQL (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) as far back as 1992, mandates the end-points should be considered within the returned date and equivalent to X >= lower_bound AND X <= upper_bound:
8.3
Function
Specify a range comparison.
Format
<between predicate> ::=
<row value constructor> [ NOT ] BETWEEN
<row value constructor> AND <row value constructor>
Syntax Rules
1) The three <row value constructor>s shall be of the same degree.
2) Let respective values be values with the same ordinal position
in the two <row value constructor>s.
3) The data types of the respective values of the three <row value
constructor>s shall be comparable.
4) Let X, Y, and Z be the first, second, and third <row value con-
structor>s, respectively.
5) "X NOT BETWEEN Y AND Z" is equivalent to "NOT ( X BETWEEN Y AND
Z )".
6) "X BETWEEN Y AND Z" is equivalent to "X>=Y AND X<=Z".
If the endpoints are inclusive, then BETWEEN is the preferred syntax.
Less references to a column means less spots to update when things change. It's the engineering principle, that less things means less stuff can break.
It also means less possibility of someone putting the wrong bracket for things like including an OR. IE:
WHERE salary BETWEEN 5000 AND (15000
OR ...)
...you'll get an error if you put the bracket around the AND part of a BETWEEN statement. Versus:
WHERE salary >= 5000
AND (salary <= 15000
OR ...)
...you'd only know there's a problem when someone reviews the data returned from the query.
Semantically, the two expressions have the same result.
However, BETWEEN is a single predicate, instead of two comparison predicates combined with AND. Depending on the optimizer provided by your RDBMS, a single predicate may be easier to optimize than two predicates.
Although I expect most modern RDBMS implementations should optimize the two expressions identically.
worse if it's
SELECT id FROM entries
WHERE
(SELECT COUNT(id) FROM anothertable WHERE something LEFT JOIN something ON...)
BETWEEN entries.max AND entries.min;
Rewrite this one with your syntax without using temporary storage.
I'd better use the 2nd one, as you always know if it's <= or <
In SQL, I agree that BETWEEN is mostly unnecessary, and can be emulated syntactically with 5000 <= salary AND salary <= 15000. It is also limited; I often want to apply an inclusive lower bound and an exclusive upper bound: #start <= when AND when < #end, which you can't do with BETWEEN.
OTOH, BETWEEN is convenient if the value being tested is the result of a complex expression.
It would be nice if SQL and other languages would follows Python's lead in using proper mathematical notation: 5000 <= salary <= 15000.
One small tip that wil make your code more readable: use < and <= in preference to > and >=.