I just stumbled upon something in ORACLE SQL (not sure if it's in others), that I am curious about. I am asking here as a wiki, since it's hard to try to search symbols in google...
I just found that when checking a value against a set of values you can do
WHERE x = ANY (a, b, c)
As opposed to the usual
WHERE x IN (a, b, c)
So I'm curious, what is the reasoning for these two syntaxes? Is one standard and one some oddball Oracle syntax? Or are they both standard? And is there a preference of one over the other for performance reasons, or ?
Just curious what anyone can tell me about that '= ANY' syntax.
ANY (or its synonym SOME) is a syntax sugar for EXISTS with a simple correlation:
SELECT *
FROM mytable
WHERE x <= ANY
(
SELECT y
FROM othertable
)
is the same as:
SELECT *
FROM mytable m
WHERE EXISTS
(
SELECT NULL
FROM othertable o
WHERE m.x <= o.y
)
With the equality condition on a not-nullable field, it becomes similar to IN.
All major databases, including SQL Server, MySQL and PostgreSQL, support this keyword.
IN- Equal to any member in the list
ANY- Compare value to **each** value returned by the subquery
ALL- Compare value to **EVERY** value returned by the subquery
<ANY() - less than maximum
>ANY() - more than minimum
=ANY() - equivalent to IN
>ALL() - more than the maximum
<ALL() - less than the minimum
eg:
Find the employees who earn the same salary as the minimum salary for each department:
SELECT last_name, salary,department_id
FROM employees
WHERE salary IN (SELECT MIN(salary)
FROM employees
GROUP BY department_id);
Employees who are not IT Programmers and whose salary is less than that of any IT programmer:
SELECT employee_id, last_name, salary, job_id
FROM employees
WHERE salary <ANY
(SELECT salary
FROM employees
WHERE job_id = 'IT_PROG')
AND job_id <> 'IT_PROG';
Employees whose salary is less than the salary ofall employees with a job ID of IT_PROG and whose job is not IT_PROG:
SELECT employee_id,last_name, salary,job_id
FROM employees
WHERE salary <ALL
(SELECT salary
FROM employees
WHERE job_id = 'IT_PROG')
AND job_id <> 'IT_PROG';
....................
Hope it helps.
-Noorin Fatima
To put it simply and quoting from O'Reilly's "Mastering Oracle SQL":
"Using IN with a subquery is functionally equivalent to using ANY, and returns TRUE if a match is found in the set returned by the subquery."
"We think you will agree that IN is more intuitive than ANY, which is why IN is almost always used in such situations."
Hope that clears up your question about ANY vs IN.
I believe that what you are looking for is this:
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/opt_ops.htm#1005298
(Link found on Eddie Awad's Blog)
To sum it up here:
last_name IN ('SMITH', 'KING',
'JONES')
is transformed into
last_name = 'SMITH' OR last_name =
'KING' OR last_name = 'JONES'
while
salary > ANY (:first_sal,
:second_sal)
is transformed into
salary > :first_sal OR salary >
:second_sal
The optimizer transforms a condition
that uses the ANY or SOME operator
followed by a subquery into a
condition containing the EXISTS
operator and a correlated subquery
The ANY syntax allows you to write things like
WHERE x > ANY(a, b, c)
or event
WHERE x > ANY(SELECT ... FROM ...)
Not sure whether there actually is anyone on the planet who uses ANY (and its brother ALL).
A quick google found this http://theopensourcery.com/sqlanysomeall.htm
Any allows you to use an operator other than = , in most other respect (special cases for nulls) it acts like IN. You can think of IN as ANY with the = operator.
This is a standard. The SQL 1992 standard states
8.4 <in predicate>
[...]
<in predicate> ::=
<row value constructor>
[ NOT ] IN <in predicate value>
[...]
2) Let RVC be the <row value constructor> and let IPV be the <in predicate value>.
[...]
4) The expression
RVC IN IPV
is equivalent to
RVC = ANY IPV
So in fact, the <in predicate> behaviour definition is based on the 8.7 <quantified comparison predicate>. In Other words, Oracle correctly implements the SQL standard here
Perhaps one of the linked articles points this out, but isn't it true that when looking for a match (=) the two return the same thing. However, if looking for a range of answers (>, <, etc) you cannot use "IN" and would have to use "ANY"...
I'm a newb, forgive me if I've missed something obvious...
MySql clears up ANY in it's documentation pretty well:
The ANY keyword, which must follow a comparison operator, means
“return TRUE if the comparison is TRUE for ANY of the values in the
column that the subquery returns.” For example:
SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);
Suppose that there is a row in table t1 containing (10). The
expression is TRUE if table t2 contains (21,14,7) because there is a
value 7 in t2 that is less than 10. The expression is FALSE if table
t2 contains (20,10), or if table t2 is empty. The expression is
unknown (that is, NULL) if table t2 contains (NULL,NULL,NULL).
https://dev.mysql.com/doc/refman/5.5/en/any-in-some-subqueries.html
Also Learning SQL by Alan Beaulieu states the following:
Although most people prefer to use IN, using = ANY is equivalent to
using the IN operator.
Why I always use any is because in some oracle or mssql versions IN list is limited by 1000/999 elements. While = any () is not limited by 1000.
Nobody likes their sql query crashing a web request.
So there is a practical difference.
Second reason it is the more modern form. As it correlates with expressions like > all (...).
Third reason is somehow for me as non-native English speaker it appears more natural to use "any" and "all" than to use IN.
Related
Let's say I have this table:
ID
LANG
NAME
DEFAULT
1
ENG
Cinderella
false
1
ENG
The Ash Bride
false
1
FRE
Cendrillon
true
1
GER
Aschenputtel
false
("Ash bride" is just fabrication that the same ID can have several names in the same language)
The SQL query should return the name in the wanted language, if it doesn't exist, the default language.
So if the user's settings are in German (GER), a look-up of the book should return the title "Aschenputtel", but if the user settings are Spanish (SPA), they should return "Cendrillon".
This question brings up the same issue, but the answer suggests a double join through the name list, one for "lang='[preferred]'" and one for the default value, with a COALESCE to find the first non-null result. I am worried this would cause performance issues if the names list is long (50,000+ entries), when there cannot be a primary key (as there can be many names per language), and the question is quite old, so wonder if there is a method more along the likes of
SELECT NAME WHERE ID=1 and (LANG='SPA' OR DEFAULT=true)
and return the first non-null result of the OR-clause. Ideally, something like:
(not functional)
SELECT COALESCE(SELECT NAME WHERE ID=1 and (LANG='SPA' OR DEFAULT=true));
would return
CENDRILLON
and
(not functional)
SELECT COALESCE(SELECT NAME WHERE ID=1 and (LANG='ENG' OR DEFAULT=true));
would return
CINDERELLA
THE ASH BRIDE
Any smooth way of having one SQL query that would yield the expected result, without doing a double join on a long list? Or is a coalesce on a double select the only answer?
You can order it in a way, that 'SPA' comes before the default (if existing) and then limit the result to one record.
The exact syntax depends on the actual DBMS you use. So the following is just an illustration how it could look:
SELECT name
FROM elbat
WHERE lang = 'SPA'
OR default
ORDER BY CASE
WHEN lang = 'SPA' THEN
0
ELSE
1
END
LIMIT 1;
But I don't know, if this any more performant. Check the plans for information on that.
select age from person where name in (select name from eats where pizza="mushroom")
I am not sure what to write for the "in". How should I solve this?
In this case the sub-select is equivalent to a join:
select age
from person p, eats e
where p.name = e.name and pizza='mushroom'
So you could translate it in:
πage (person p ⋈p.name=e.name (σpizza='mushroom'(eats e)))
Here's my guess. I'm assuming that set membership symbol is part of relational algebra
For base table r, C a column of both r & s, and x an unused name,
select ... from r where C in s
returns the same value as
select ... from r natural join (s) x
The use of in requires that s has one column. The in is true for a row of r exactly when its C equals the value in s. So the where keeps exactly the rows of r whose C equals the value in s. We assumed that s has column C, so the where keeps exactly the rows of r whose C equals the C of the row in r. Those are same rows that are returned by the natural join.
(For an expression like this where-in with C not a column of both r and s then this translation is not applicable. Similarly, the reverse translation is only applicable under certain conditions.)
How useful this particular translation is to you or whether you could simplify it or must complexify it depends on what variants of SQL & "relational algebra" you are using, what limitations you have on input expressions and other translation decisions you have made. If you use very straightforward and general translations then the output is more complex but obviously correct. If you translate using a lot of special case rules and ad hoc simplifications along the way then the output is simpler but the justification that the answer is correct is longer.
I've got an assignment with the following instructions:
Create a view named A11T1 (that's A-One-One-T-One, not A-L-L-T-L) that will display the concatenated name, JobTitle and Salary of the people who have a Cat value of N and whose salary is at least 30 percent higher than the average salary of all people who have a Cat value of N. The three column headings should be Name, JobTitle and Salary. The rows should be sorted in traditional phonebook order.
Note 1: As always, concatenated names must appear with one space between the first and last names.
Note 2: The concatenated names and job titles must be displayed in proper case (e.g., Mary Ellen Smith, Assistant Manager) for this task.
Note 3: Remember, the Person11 data is messy. Be sure to look for N and n when you are identifying the people with a Cat value of N.
What I have so far is:
CREATE VIEW A11T1 AS
SELECT INITCAP(FNAME||' '||LNAME) AS "Name", INITCAP(JobTitle), Salary
FROM PERSON11
WHERE UPPER(CAT) = 'N'
GROUP by INITCAP(FNAME||' '||LNAME), INITCAP(JobTitle), Salary
HAVING SALARY >= 1.3 * ROUND(AVG(SALARY),0)
Order by LNAME, FNAME
Error at Command Line:7 Column:10 Error report: SQL Error: ORA-00979: not a GROUP BY expression 00979. 00000 - "not a GROUP BY expression"
Is the current error I'm getting
No matter how much I edit my code it just won't create into a view and I've been stuck on this for hours! I appreciate any responses, even a point in the right direction.
Why do you need to "group by" concatenated name, job title and salary? Do you have more than one row per name?
Perhaps it's because you need to compute the average salary and that requires aggregation? You can't do everything in a single SELECT statement in SQL (at least not with simple tools - you seem to be in the early stages of learning and not looking to use window functions).
The "avg salary" needs to come from a subquery. Where you have >= 1.3 * round(...) you should have instead:
... >= 1.3 * (select avg(salary) from person11 where cat = 'N')
Note that the subquery must be enclosed in parentheses. In your code I see you use upper(cat) - is there a concern that cat may be upper or lower case? In that case it may be better to write
cat in ('n', 'N')
Avoid wrapping column values inside functions whenever possible (that often leads to worse performance). Also, I see no need to round the average salary in your requirements - and in any case, what's the point to rounding to zero decimal places if you then multiply by 1.3? Rounding may actually lead to incorrect output.
EDIT: Sorry, to clarify: I think you are well on your way already. Use the subquery for the average salary, remove the group by (which doesn't hurt anything but is really unneeded), and if you care to, change the upper(cat) as I suggested; I think your query will work with these changes.
Good luck!
I think the easiest way uses window functions:
CREATE VIEW A11T1 AS
SELECT INITCAP(FNAME || ' '|| LNAME) AS Name, INITCAP(JobTitle), Salary
FROM (SELECT p.*, AVG(SALARY) OVER () as avg_salary
FROM FROM PERSON11 p
WHERE UPPER(CAT) = 'N'
) p
WHERE SALARY >= 1.3 * avg_salary
ORDER BY LNAME, FNAME ;
USE Saleslogix
DECLARE #AssumedGrowth int
SET #AssumedGrowth = 28
SELECT
account,
employees as NumberIn2013,
#AssumedGrowth += employees as NumberIn2014
FROM sysdba.account
WHERE employees <> 'NULL'
and account like 'Shaw%'
It's telling me that += is invalid and only works with +. Can someone help me with getting this example to work as a compound operator? I don't know if it makes too much difference, but I am using 2005 Management Studio.
Also if it's not a huge pain, adding the same example with #AssumedGrowth being a percentage?
What you are trying to do is this:
SELECT account, employees as NumberIn2013,
(#AssumedGrowth = #AssumedGroup + employees) as NumberIn2014
FROM sysdba.account
WHERE employees <> 'NULL' and account like 'Shaw%';
But, I don't think it will work. I would instead suggest using the built-in capabilities, in particular, row_number():
SELECT account, employees as NumberIn2013,
employees * pow(1 + #AssumedGrowth/100.0, row_number() over (order by <field>) - 1)
FROM sysdba.account
WHERE employees <> 'NULL' and account like 'Shaw%';
Note that you need to specify the ordering for the results. Presumably, there is some sort of id or datetime column that has the appropriate order. Tables represent unordered sets, so there is no "first" row.
As seen below the two queries, we find that they both work well. Then I am confused why should we ever use BETWEEN because I have found that BETWEEN behaves differently in different databases as found in w3school
SELECT *
FROM employees
WHERE salary BETWEEN 5000 AND 15000;
SELECT *
FROM employees
WHERE salary >= 5000
AND salary <= 15000;
BETWEEN can help to avoid unnecessary reevaluation of the expression:
SELECT AVG(RAND(20091225) BETWEEN 0.2 AND 0.4)
FROM t_source;
---
0.1998
SELECT AVG(RAND(20091225) >= 0.2 AND RAND(20091225) <= 0.4)
FROM t_source;
---
0.3199
t_source is just a dummy table with 1,000,000 records.
Of course this can be worked around using a subquery, but in MySQL it's less efficient.
And of course, BETWEEN is more readable. It takes 3 times to use it in a query to remember the syntax forever.
In SQL Server and MySQL, LIKE against a constant with non-leading '%' is also a shorthand for a pair of >= and <:
SET SHOWPLAN_TEXT ON
GO
SELECT *
FROM master
WHERE name LIKE 'string%'
GO
SET SHOWPLAN_TEXT OFF
GO
|--Index Seek(OBJECT:([test].[dbo].[master].[ix_name_desc]), SEEK:([test].[dbo].[master].[name] < 'strinH' AND [test].[dbo].[master].[name] >= 'string'), WHERE:([test].[dbo].[master].[name] like 'string%') ORDERED FORWARD)
However, LIKE syntax is more legible.
Using BETWEEN has extra merits when the expression that is compared is a complex calculation rather than just a simple column; it saves writing out that complex expression twice.
BETWEEN in T-SQL supports NOT operator, so you can use constructions like
WHERE salary not between 5000 AND 15000;
In my opinion it's more clear for a human then
WHERE salary < 5000 OR salary > 15000;
And finally if you type column name just one time it gives you less chances to make a mistake
The version with "between" is easier to read. If I were to use the second version I'd probably write it as
5000 <= salary and salary <= 15000
for the same reason.
I vote #Quassnoi - correctness is a big win.
I usually find literals more useful than the syntax symbols like <, <=, >, >=, != etc. Yes, we need (better, accurate) results. And at least I get rid of probabilities of mis-interpreting and reverting meanings of the symbols visually. If you use <= and sense logically incorrect output coming from your select query, you may wander some time and only arrive to the conclusion that you did write <= in place of >= [visual mis-interpretation?]. Hope I am clear.
And aren't we shortening the code (along with making it more higher-level-looking), which means more concise and easy to maintain?
SELECT *
FROM emplyees
WHERE salary between 5000 AND 15000;
SELECT *
FROM emplyees
WHERE salary >= 5000 AND salary <= 15000;
First query uses only 10 words and second uses 12!
Personally, I wouldn't use BETWEEN, simply because there seems no clear definition of whether it should include, or exclude, the values which serve to bound the condition, in your given example:
SELECT *
FROM emplyees
WHERE salary between 5000 AND 15000;
The range could include the 5000 and 15000, or it could exclude them.
Syntactically I think it should exclude them, since the values themselves are not between the given numbers. But my opinion is precisely that, whereas using operators such as >= is very specific. And less likely to change between databases, or between incremements/versions of the same.
Edited in response to Pavel and Jonathan's comments.
As noted by Pavel, ANSI SQL (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) as far back as 1992, mandates the end-points should be considered within the returned date and equivalent to X >= lower_bound AND X <= upper_bound:
8.3
Function
Specify a range comparison.
Format
<between predicate> ::=
<row value constructor> [ NOT ] BETWEEN
<row value constructor> AND <row value constructor>
Syntax Rules
1) The three <row value constructor>s shall be of the same degree.
2) Let respective values be values with the same ordinal position
in the two <row value constructor>s.
3) The data types of the respective values of the three <row value
constructor>s shall be comparable.
4) Let X, Y, and Z be the first, second, and third <row value con-
structor>s, respectively.
5) "X NOT BETWEEN Y AND Z" is equivalent to "NOT ( X BETWEEN Y AND
Z )".
6) "X BETWEEN Y AND Z" is equivalent to "X>=Y AND X<=Z".
If the endpoints are inclusive, then BETWEEN is the preferred syntax.
Less references to a column means less spots to update when things change. It's the engineering principle, that less things means less stuff can break.
It also means less possibility of someone putting the wrong bracket for things like including an OR. IE:
WHERE salary BETWEEN 5000 AND (15000
OR ...)
...you'll get an error if you put the bracket around the AND part of a BETWEEN statement. Versus:
WHERE salary >= 5000
AND (salary <= 15000
OR ...)
...you'd only know there's a problem when someone reviews the data returned from the query.
Semantically, the two expressions have the same result.
However, BETWEEN is a single predicate, instead of two comparison predicates combined with AND. Depending on the optimizer provided by your RDBMS, a single predicate may be easier to optimize than two predicates.
Although I expect most modern RDBMS implementations should optimize the two expressions identically.
worse if it's
SELECT id FROM entries
WHERE
(SELECT COUNT(id) FROM anothertable WHERE something LEFT JOIN something ON...)
BETWEEN entries.max AND entries.min;
Rewrite this one with your syntax without using temporary storage.
I'd better use the 2nd one, as you always know if it's <= or <
In SQL, I agree that BETWEEN is mostly unnecessary, and can be emulated syntactically with 5000 <= salary AND salary <= 15000. It is also limited; I often want to apply an inclusive lower bound and an exclusive upper bound: #start <= when AND when < #end, which you can't do with BETWEEN.
OTOH, BETWEEN is convenient if the value being tested is the result of a complex expression.
It would be nice if SQL and other languages would follows Python's lead in using proper mathematical notation: 5000 <= salary <= 15000.
One small tip that wil make your code more readable: use < and <= in preference to > and >=.