IN statement transition - sql

Using a subquery list all male employees with salaries higher than any female employee. Does this look right? Also I need to change it to an IN statement. Everything I've tried returns all female salaries.
SELECT FirstName, LastName, Salary
FROM Employee
WHERE Salary > ALL
(Select Salary FROM Employee
WHERE Gender = 'F')

Right, for the record, I think Gordon's answer is technically much better than what follows. He's doing it right. The requirement to use IN is frankly bizarre.
Since x > ALL y is naturally translated to a NOT EXISTS y such that x <= y
and EXISTS can relatively naturally be translated to IN, we could do this as x NOT IN (z such that z <= y). Assuming the Employee table has a PK field called EmployeeID, you can do this as follows (I threw in some bonus INs for you for free):
SELECT
m.FirstName,
m.LastName,
m.Salary
FROM
Employee m
WHERE
m.Gender IN ('M')
AND m.Salary is not null
AND m.EmployeeID NOT IN
(
select m2.EmployeeID
from Employee m2
where m2.Salary <= (select f.Salary
FROM Employee f
where f.Gender IN ('F'))
)
AND 'Why' IN ('Why','IN','Why','?')
AND 'Why' NOT IN ('Another','Way','?')
This is horrid and much less efficient than Gordon's answer.
If this is part of an assignment for a course, one has to wonder what the person setting this assignment was thinking. There's the obvious possibility that they simply don't know what they are doing, but perhaps they wanted you to jump through some logical hoops or to teach you that it's often possible to achieve the same thing in multiple different ways and that these may or may not be equally efficient.

First, I would write this as:
SELECT m.*
FROM Employee m
WHERE m.Gender = 'M' AND
m.Salary > (SELECT MAX(f.Salary)
FROM Employee f
WHERE f.Gender = 'F'
);
I like this version because it is quite explicit. It is almost a direct translation of "Get all males that earn more than the highest paid female."
Your version also works, under the assumption that gender only takes on two values. However, it relies on subtlety: The comparison is >. So, no female can have a salary higher than all females (including herself). Hence, the outer query can only return not-females, which happen to be males.
Instead of going through those mental gymnastics, the condition on = 'M' makes the intention of the query much clearer.
With that condition, the query is clearer:
SELECT m.*
FROM Employee m
WHERE m.Gender = 'M' AND
m.Salary > ALL (SELECT f.Salary
FROM Employee f
WHERE f.Gender = 'F'
);
I should note that this version is better than the version with MAX() in one edge case. When there are no females, this will return all employees. The version with MAX() will return no rows at all.
I should add: It is not obvious how to turn this query into a version using IN, at least in a sensible way.

Related

Using SQL to assign grades to marks from a lookup table

I am trying to write a query to give a grade to a percentage
I have a table of percentages per student per paper (StuID, pct, paperID) and a table of grade boundaries (paperID, minScore, maxScore, Grade)
the idea being to have a query that gives me student name and the grade that falls between min and max scores for the pct.
ridiculoulsy easy (lookup) in a spreadsheet, and seems to be ridiculously hard in SQL. I am really trying to avoid exporting to Excel and calculating there, or hard coding the boundaries within a selection, but at the moment they seem to be my only options.
Any suggestions to keep this a)in SQL and b)as generalised as possible (ie I am going to want to reuse the query with different grade boundaries)
Due to (ridiculous) software constraints at work, I am limited to MS Access for my DB needs
thanks
Just be careful on how you handle the ranges. So check >= and < conditions and adjust to your case.
SELECT S.StuID, S.paperID, S.pct, G.Grade
FROM Student S
JOIN Grades G
ON S.grade >= G.minScore
AND S.grade < G.maxScore

Query complex in Oracle SQL

I have the following tables and their fields
They ask me for a query that seems to me quite complex, I have been going around for two days and trying things, it says:
It is desired to obtain the average age of female athletes, medal winners (gold, silver or bronze), for the different modalities of 'Artistic Gymnastics'. Analyze the possible contents of the result field in order to return only the expected values, even when there is no data of any specific value for the set of records displayed by the query. Specifically, we want to show the gender indicator of the athletes, the medal obtained, and the average age of these athletes. The age will be calculated by subtracting from the system date (SYSDATE), the date of birth of the athlete, dividing said value by 365. In order to avoid showing decimals, truncate (TRUNC) the result of the calculation of age. Order the results by the average age of the athletes.
Well right now I have this:
select person.gender,score.score
from person,athlete,score,competition,sport
where person.idperson = athlete.idathlete and
athlete.idathlete= score.idathlete and
competition.idsport = sport.idsport and
person.gender='F' and competition.idsport=18 and score.score in
('Gold','Silver','Bronze')
group by
person.gender,
score.score;
And I got this out
By adding the person.birthdate field instead of leaving 18 records of the 18 people who have a medal, I'm going to many more records.
Apart from that, I still have to draw the average age with SYSDATE and TRUNC that I try in many ways but I do not get it.
I see it very complicated or I'm a bit saturated from so much spinning, I need some help.
Reading the task you got, it seems that you're quite close to the solution. Have a look at the following query and its explanation, note the differences from your query, see if it helps.
select p.gender,
((sysdate - p.birthday) / 365) age,
s.score
from person p join athlete a on a.idathlete = p.idperson
left join score s on s.idathlete = a.idathlete
left join competition c on c.idcompetition = s.idcompetition
where p.gender = 'F'
and s.score in ('Gold', 'Silver', 'Bronze')
and c.idsport = 18
order by age;
when two dates are subtracted, the result is number of days. Dividing it by 365, you - roughly - get number of years (as each year has 365 days - that's for simplicity, of course, as not all years have that many days (hint: leap years)). The result is usually a decimal number, e.g. 23.912874918724. In order to avoid that, you were told to remove decimals, so - use TRUNC and get 23 as the result
although data model contains 5 tables, you don't have to use all of them in a query. Maybe the best approach is to go step-by-step. The first one would be to simply select all female athletes and calculate their age:
select p.gender,
((sysdate - p.birthday) / 365 age
from person p
where p.gender = 'F'
Note that I've used a table alias - I'd suggest you to use them too, as they make queries easier to read (table names can have really long names which don't help in readability). Also, always use table aliases to avoid confusion (which column belongs to which table)
Once you're satisfied with that result, move on to another table - athlete It is here just as a joining mechanism with the score table that contains ... well, scores. Note that I've used outer join for the score table because not all athletes have won the medal. I presume that this is what the task you've been given says:
... even when there is no data of any specific value for the set of records displayed by the query.
It is suggested that we - as developers - use explicit table joins which let you to see all joins separated from filters (which should be part of the WHERE clause). So:
NO : from person p, athlete a
where a.idathlete = p.idperson
and p.gender = 'F'
YES: from person p join athlete a on a.idathlete = p.idperson
where p.gender = 'F'
Then move to yet another table, and so forth.
Test frequently, all the time - don't skip steps. Move on to another one only when you're sure that the previous step's result is correct, as - in most cases - it won't automagically fix itself.

Oracle SQL Developer Create View Assignment

I've got an assignment with the following instructions:
Create a view named A11T1 (that's A-One-One-T-One, not A-L-L-T-L) that will display the concatenated name, JobTitle and Salary of the people who have a Cat value of N and whose salary is at least 30 percent higher than the average salary of all people who have a Cat value of N. The three column headings should be Name, JobTitle and Salary. The rows should be sorted in traditional phonebook order.
Note 1: As always, concatenated names must appear with one space between the first and last names.
Note 2: The concatenated names and job titles must be displayed in proper case (e.g., Mary Ellen Smith, Assistant Manager) for this task.
Note 3: Remember, the Person11 data is messy. Be sure to look for N and n when you are identifying the people with a Cat value of N.
What I have so far is:
CREATE VIEW A11T1 AS
SELECT INITCAP(FNAME||' '||LNAME) AS "Name", INITCAP(JobTitle), Salary
FROM PERSON11
WHERE UPPER(CAT) = 'N'
GROUP by INITCAP(FNAME||' '||LNAME), INITCAP(JobTitle), Salary
HAVING SALARY >= 1.3 * ROUND(AVG(SALARY),0)
Order by LNAME, FNAME
Error at Command Line:7 Column:10 Error report: SQL Error: ORA-00979: not a GROUP BY expression 00979. 00000 - "not a GROUP BY expression"
Is the current error I'm getting
No matter how much I edit my code it just won't create into a view and I've been stuck on this for hours! I appreciate any responses, even a point in the right direction.
Why do you need to "group by" concatenated name, job title and salary? Do you have more than one row per name?
Perhaps it's because you need to compute the average salary and that requires aggregation? You can't do everything in a single SELECT statement in SQL (at least not with simple tools - you seem to be in the early stages of learning and not looking to use window functions).
The "avg salary" needs to come from a subquery. Where you have >= 1.3 * round(...) you should have instead:
... >= 1.3 * (select avg(salary) from person11 where cat = 'N')
Note that the subquery must be enclosed in parentheses. In your code I see you use upper(cat) - is there a concern that cat may be upper or lower case? In that case it may be better to write
cat in ('n', 'N')
Avoid wrapping column values inside functions whenever possible (that often leads to worse performance). Also, I see no need to round the average salary in your requirements - and in any case, what's the point to rounding to zero decimal places if you then multiply by 1.3? Rounding may actually lead to incorrect output.
EDIT: Sorry, to clarify: I think you are well on your way already. Use the subquery for the average salary, remove the group by (which doesn't hurt anything but is really unneeded), and if you care to, change the upper(cat) as I suggested; I think your query will work with these changes.
Good luck!
I think the easiest way uses window functions:
CREATE VIEW A11T1 AS
SELECT INITCAP(FNAME || ' '|| LNAME) AS Name, INITCAP(JobTitle), Salary
FROM (SELECT p.*, AVG(SALARY) OVER () as avg_salary
FROM FROM PERSON11 p
WHERE UPPER(CAT) = 'N'
) p
WHERE SALARY >= 1.3 * avg_salary
ORDER BY LNAME, FNAME ;

SQL Where argument returning as numeric and not boolean

I'm new to sql and I've been encountering the error
"argument of WHERE must be type boolean, not type numeric" from the following code
select subject_no, subject_name, class_size from Subject
where (select AVG(mark) from Grades where mark > 75)
group by subject_no
order by subject_no asc
To assist in understanding the question what I am attempting to do is list subjects with an average mark less than 75
By my understanding though the where argument would be Boolean as the average mark from a class would be either above or below 60 and therefore true of false, any assistance in correcting my understanding is greatly appreciated.
Edited!
Use a correlated sub-query to find subject's with avg(grade) < 75. No need for GROUP BY since no aggregate functions, use DISTINCT instead to remove duplicates:
select distinct subject_no, subject_name, class_size
from Subject s
where (select AVG(mark) from grades g
where g.subject_no = s.subject_no) < 75
order by subject_no asc
Note, I assumed there's subject_no column in the Grades table too.
First of ALL, the Return Value of (select AVG(mark) from Grades where mark > 75) is not Boolean as you mentioned. It is exactly AVG(mark) itself. so you can actually write like:
select 1+1 from dual and the return value is 2 or select 'hello world ' from dual
and the return value is exactly the String hello world.
So , if you want list subjects with an average mark less than 75. following statements of where should be more like:
mark<(select AVG(mark) from Grades where mark > 75)
this is going to return a value of Boolean.
However, your statement explaining your question is too hard to grasp :P
I guess a programmer needs a little more time to understand SQL when you are not
too familiar at first. Good luck. If you could explain your question more accurate... its much easier to have a correct answer you are looking for.
You have more errors in your query - you have to join the Grades table to Subject Table, move the condition outside the select statement (and change > to <), finally remove the group by clause:
select subject_no, subject_name, class_size from Subject
where (select AVG(mark) from Grades where Grades.subject_no = Subject.subject_no) < 75
order by subject_no asc
You should rewrite your select statement to this optimized form (additionally you get the average mark in the result set):
select subject_no, subject_name, class_size, avg_mark
from Subject s
join (select subject_no, AVG(mark) avg_mark
from Grades
group by subject_no
having AVG(mark) < 75) g on g.subject_no = s.subject_no
order by subject_no asc

Oracle: '= ANY()' vs. 'IN ()'

I just stumbled upon something in ORACLE SQL (not sure if it's in others), that I am curious about. I am asking here as a wiki, since it's hard to try to search symbols in google...
I just found that when checking a value against a set of values you can do
WHERE x = ANY (a, b, c)
As opposed to the usual
WHERE x IN (a, b, c)
So I'm curious, what is the reasoning for these two syntaxes? Is one standard and one some oddball Oracle syntax? Or are they both standard? And is there a preference of one over the other for performance reasons, or ?
Just curious what anyone can tell me about that '= ANY' syntax.
ANY (or its synonym SOME) is a syntax sugar for EXISTS with a simple correlation:
SELECT *
FROM mytable
WHERE x <= ANY
(
SELECT y
FROM othertable
)
is the same as:
SELECT *
FROM mytable m
WHERE EXISTS
(
SELECT NULL
FROM othertable o
WHERE m.x <= o.y
)
With the equality condition on a not-nullable field, it becomes similar to IN.
All major databases, including SQL Server, MySQL and PostgreSQL, support this keyword.
IN- Equal to any member in the list
ANY- Compare value to **each** value returned by the subquery
ALL- Compare value to **EVERY** value returned by the subquery
<ANY() - less than maximum
>ANY() - more than minimum
=ANY() - equivalent to IN
>ALL() - more than the maximum
<ALL() - less than the minimum
eg:
Find the employees who earn the same salary as the minimum salary for each department:
SELECT last_name, salary,department_id
FROM employees
WHERE salary IN (SELECT MIN(salary)
FROM employees
GROUP BY department_id);
Employees who are not IT Programmers and whose salary is less than that of any IT programmer:
SELECT employee_id, last_name, salary, job_id
FROM employees
WHERE salary <ANY
(SELECT salary
FROM employees
WHERE job_id = 'IT_PROG')
AND job_id <> 'IT_PROG';
Employees whose salary is less than the salary ofall employees with a job ID of IT_PROG and whose job is not IT_PROG:
SELECT employee_id,last_name, salary,job_id
FROM employees
WHERE salary <ALL
(SELECT salary
FROM employees
WHERE job_id = 'IT_PROG')
AND job_id <> 'IT_PROG';
....................
Hope it helps.
-Noorin Fatima
To put it simply and quoting from O'Reilly's "Mastering Oracle SQL":
"Using IN with a subquery is functionally equivalent to using ANY, and returns TRUE if a match is found in the set returned by the subquery."
"We think you will agree that IN is more intuitive than ANY, which is why IN is almost always used in such situations."
Hope that clears up your question about ANY vs IN.
I believe that what you are looking for is this:
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/opt_ops.htm#1005298
(Link found on Eddie Awad's Blog)
To sum it up here:
last_name IN ('SMITH', 'KING',
'JONES')
is transformed into
last_name = 'SMITH' OR last_name =
'KING' OR last_name = 'JONES'
while
salary > ANY (:first_sal,
:second_sal)
is transformed into
salary > :first_sal OR salary >
:second_sal
The optimizer transforms a condition
that uses the ANY or SOME operator
followed by a subquery into a
condition containing the EXISTS
operator and a correlated subquery
The ANY syntax allows you to write things like
WHERE x > ANY(a, b, c)
or event
WHERE x > ANY(SELECT ... FROM ...)
Not sure whether there actually is anyone on the planet who uses ANY (and its brother ALL).
A quick google found this http://theopensourcery.com/sqlanysomeall.htm
Any allows you to use an operator other than = , in most other respect (special cases for nulls) it acts like IN. You can think of IN as ANY with the = operator.
This is a standard. The SQL 1992 standard states
8.4 <in predicate>
[...]
<in predicate> ::=
<row value constructor>
[ NOT ] IN <in predicate value>
[...]
2) Let RVC be the <row value constructor> and let IPV be the <in predicate value>.
[...]
4) The expression
RVC IN IPV
is equivalent to
RVC = ANY IPV
So in fact, the <in predicate> behaviour definition is based on the 8.7 <quantified comparison predicate>. In Other words, Oracle correctly implements the SQL standard here
Perhaps one of the linked articles points this out, but isn't it true that when looking for a match (=) the two return the same thing. However, if looking for a range of answers (>, <, etc) you cannot use "IN" and would have to use "ANY"...
I'm a newb, forgive me if I've missed something obvious...
MySql clears up ANY in it's documentation pretty well:
The ANY keyword, which must follow a comparison operator, means
“return TRUE if the comparison is TRUE for ANY of the values in the
column that the subquery returns.” For example:
SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);
Suppose that there is a row in table t1 containing (10). The
expression is TRUE if table t2 contains (21,14,7) because there is a
value 7 in t2 that is less than 10. The expression is FALSE if table
t2 contains (20,10), or if table t2 is empty. The expression is
unknown (that is, NULL) if table t2 contains (NULL,NULL,NULL).
https://dev.mysql.com/doc/refman/5.5/en/any-in-some-subqueries.html
Also Learning SQL by Alan Beaulieu states the following:
Although most people prefer to use IN, using = ANY is equivalent to
using the IN operator.
Why I always use any is because in some oracle or mssql versions IN list is limited by 1000/999 elements. While = any () is not limited by 1000.
Nobody likes their sql query crashing a web request.
So there is a practical difference.
Second reason it is the more modern form. As it correlates with expressions like > all (...).
Third reason is somehow for me as non-native English speaker it appears more natural to use "any" and "all" than to use IN.