I've got an assignment with the following instructions:
Create a view named A11T1 (that's A-One-One-T-One, not A-L-L-T-L) that will display the concatenated name, JobTitle and Salary of the people who have a Cat value of N and whose salary is at least 30 percent higher than the average salary of all people who have a Cat value of N. The three column headings should be Name, JobTitle and Salary. The rows should be sorted in traditional phonebook order.
Note 1: As always, concatenated names must appear with one space between the first and last names.
Note 2: The concatenated names and job titles must be displayed in proper case (e.g., Mary Ellen Smith, Assistant Manager) for this task.
Note 3: Remember, the Person11 data is messy. Be sure to look for N and n when you are identifying the people with a Cat value of N.
What I have so far is:
CREATE VIEW A11T1 AS
SELECT INITCAP(FNAME||' '||LNAME) AS "Name", INITCAP(JobTitle), Salary
FROM PERSON11
WHERE UPPER(CAT) = 'N'
GROUP by INITCAP(FNAME||' '||LNAME), INITCAP(JobTitle), Salary
HAVING SALARY >= 1.3 * ROUND(AVG(SALARY),0)
Order by LNAME, FNAME
Error at Command Line:7 Column:10 Error report: SQL Error: ORA-00979: not a GROUP BY expression 00979. 00000 - "not a GROUP BY expression"
Is the current error I'm getting
No matter how much I edit my code it just won't create into a view and I've been stuck on this for hours! I appreciate any responses, even a point in the right direction.
Why do you need to "group by" concatenated name, job title and salary? Do you have more than one row per name?
Perhaps it's because you need to compute the average salary and that requires aggregation? You can't do everything in a single SELECT statement in SQL (at least not with simple tools - you seem to be in the early stages of learning and not looking to use window functions).
The "avg salary" needs to come from a subquery. Where you have >= 1.3 * round(...) you should have instead:
... >= 1.3 * (select avg(salary) from person11 where cat = 'N')
Note that the subquery must be enclosed in parentheses. In your code I see you use upper(cat) - is there a concern that cat may be upper or lower case? In that case it may be better to write
cat in ('n', 'N')
Avoid wrapping column values inside functions whenever possible (that often leads to worse performance). Also, I see no need to round the average salary in your requirements - and in any case, what's the point to rounding to zero decimal places if you then multiply by 1.3? Rounding may actually lead to incorrect output.
EDIT: Sorry, to clarify: I think you are well on your way already. Use the subquery for the average salary, remove the group by (which doesn't hurt anything but is really unneeded), and if you care to, change the upper(cat) as I suggested; I think your query will work with these changes.
Good luck!
I think the easiest way uses window functions:
CREATE VIEW A11T1 AS
SELECT INITCAP(FNAME || ' '|| LNAME) AS Name, INITCAP(JobTitle), Salary
FROM (SELECT p.*, AVG(SALARY) OVER () as avg_salary
FROM FROM PERSON11 p
WHERE UPPER(CAT) = 'N'
) p
WHERE SALARY >= 1.3 * avg_salary
ORDER BY LNAME, FNAME ;
Related
I have the following tables and their fields
They ask me for a query that seems to me quite complex, I have been going around for two days and trying things, it says:
It is desired to obtain the average age of female athletes, medal winners (gold, silver or bronze), for the different modalities of 'Artistic Gymnastics'. Analyze the possible contents of the result field in order to return only the expected values, even when there is no data of any specific value for the set of records displayed by the query. Specifically, we want to show the gender indicator of the athletes, the medal obtained, and the average age of these athletes. The age will be calculated by subtracting from the system date (SYSDATE), the date of birth of the athlete, dividing said value by 365. In order to avoid showing decimals, truncate (TRUNC) the result of the calculation of age. Order the results by the average age of the athletes.
Well right now I have this:
select person.gender,score.score
from person,athlete,score,competition,sport
where person.idperson = athlete.idathlete and
athlete.idathlete= score.idathlete and
competition.idsport = sport.idsport and
person.gender='F' and competition.idsport=18 and score.score in
('Gold','Silver','Bronze')
group by
person.gender,
score.score;
And I got this out
By adding the person.birthdate field instead of leaving 18 records of the 18 people who have a medal, I'm going to many more records.
Apart from that, I still have to draw the average age with SYSDATE and TRUNC that I try in many ways but I do not get it.
I see it very complicated or I'm a bit saturated from so much spinning, I need some help.
Reading the task you got, it seems that you're quite close to the solution. Have a look at the following query and its explanation, note the differences from your query, see if it helps.
select p.gender,
((sysdate - p.birthday) / 365) age,
s.score
from person p join athlete a on a.idathlete = p.idperson
left join score s on s.idathlete = a.idathlete
left join competition c on c.idcompetition = s.idcompetition
where p.gender = 'F'
and s.score in ('Gold', 'Silver', 'Bronze')
and c.idsport = 18
order by age;
when two dates are subtracted, the result is number of days. Dividing it by 365, you - roughly - get number of years (as each year has 365 days - that's for simplicity, of course, as not all years have that many days (hint: leap years)). The result is usually a decimal number, e.g. 23.912874918724. In order to avoid that, you were told to remove decimals, so - use TRUNC and get 23 as the result
although data model contains 5 tables, you don't have to use all of them in a query. Maybe the best approach is to go step-by-step. The first one would be to simply select all female athletes and calculate their age:
select p.gender,
((sysdate - p.birthday) / 365 age
from person p
where p.gender = 'F'
Note that I've used a table alias - I'd suggest you to use them too, as they make queries easier to read (table names can have really long names which don't help in readability). Also, always use table aliases to avoid confusion (which column belongs to which table)
Once you're satisfied with that result, move on to another table - athlete It is here just as a joining mechanism with the score table that contains ... well, scores. Note that I've used outer join for the score table because not all athletes have won the medal. I presume that this is what the task you've been given says:
... even when there is no data of any specific value for the set of records displayed by the query.
It is suggested that we - as developers - use explicit table joins which let you to see all joins separated from filters (which should be part of the WHERE clause). So:
NO : from person p, athlete a
where a.idathlete = p.idperson
and p.gender = 'F'
YES: from person p join athlete a on a.idathlete = p.idperson
where p.gender = 'F'
Then move to yet another table, and so forth.
Test frequently, all the time - don't skip steps. Move on to another one only when you're sure that the previous step's result is correct, as - in most cases - it won't automagically fix itself.
Using a subquery list all male employees with salaries higher than any female employee. Does this look right? Also I need to change it to an IN statement. Everything I've tried returns all female salaries.
SELECT FirstName, LastName, Salary
FROM Employee
WHERE Salary > ALL
(Select Salary FROM Employee
WHERE Gender = 'F')
Right, for the record, I think Gordon's answer is technically much better than what follows. He's doing it right. The requirement to use IN is frankly bizarre.
Since x > ALL y is naturally translated to a NOT EXISTS y such that x <= y
and EXISTS can relatively naturally be translated to IN, we could do this as x NOT IN (z such that z <= y). Assuming the Employee table has a PK field called EmployeeID, you can do this as follows (I threw in some bonus INs for you for free):
SELECT
m.FirstName,
m.LastName,
m.Salary
FROM
Employee m
WHERE
m.Gender IN ('M')
AND m.Salary is not null
AND m.EmployeeID NOT IN
(
select m2.EmployeeID
from Employee m2
where m2.Salary <= (select f.Salary
FROM Employee f
where f.Gender IN ('F'))
)
AND 'Why' IN ('Why','IN','Why','?')
AND 'Why' NOT IN ('Another','Way','?')
This is horrid and much less efficient than Gordon's answer.
If this is part of an assignment for a course, one has to wonder what the person setting this assignment was thinking. There's the obvious possibility that they simply don't know what they are doing, but perhaps they wanted you to jump through some logical hoops or to teach you that it's often possible to achieve the same thing in multiple different ways and that these may or may not be equally efficient.
First, I would write this as:
SELECT m.*
FROM Employee m
WHERE m.Gender = 'M' AND
m.Salary > (SELECT MAX(f.Salary)
FROM Employee f
WHERE f.Gender = 'F'
);
I like this version because it is quite explicit. It is almost a direct translation of "Get all males that earn more than the highest paid female."
Your version also works, under the assumption that gender only takes on two values. However, it relies on subtlety: The comparison is >. So, no female can have a salary higher than all females (including herself). Hence, the outer query can only return not-females, which happen to be males.
Instead of going through those mental gymnastics, the condition on = 'M' makes the intention of the query much clearer.
With that condition, the query is clearer:
SELECT m.*
FROM Employee m
WHERE m.Gender = 'M' AND
m.Salary > ALL (SELECT f.Salary
FROM Employee f
WHERE f.Gender = 'F'
);
I should note that this version is better than the version with MAX() in one edge case. When there are no females, this will return all employees. The version with MAX() will return no rows at all.
I should add: It is not obvious how to turn this query into a version using IN, at least in a sensible way.
I am practicing SQL in Microsoft SQL Server 2012 (not a homework question), and have a table Names. The table shows baby names by year, with columns Sex (gender of name), N (number of babies having that name), Yr (year), and Name (the name itself).
I need to write a query using only one SELECT statement that returns the most popular baby name by year, with gender, the year, and the number of babies named. So far I have;
SELECT *
From Names
ORDER By N DESC;
Which gives the highest values of N in DESC order, repeating years. I need to limit it to only the highest value in each year, and everything I have tried to do so has thrown errors. Any advice you can give me for this would be appreciated.
Off the top of my my head, something like the following would normally let you do it in (technically) one SELECT statment. That statement includes sub-SELECTs, but I'm not immediately seeing an alternative that wouldn't.
When there's joint top ranking names, both queries should bring back all joint top results so there may not be exactly one answer. If you then just need a random single representative row from those result, look at using select top 1, perhaps adding order by to get the first alphabetically.
Most popular by year regardless of gender:
-- ONE PER YEAR:
SELECT n.Year, n.Name, n.Gender, n.Qty FROM Name n
WHERE NOT EXISTS (
SELECT 1 FROM Name n2
WHERE n2.Year = n.Year
AND n2.Qty > n.Qty
)
Most popular by year for each gender:
-- ONE PER GENDER PER YEAR:
SELECT n.Year, n.Name, n.Gender, n.Qty FROM Name n
WHERE NOT EXISTS (
SELECT 1 FROM Name n2
WHERE n2.Year = n.Year
AND n2.Gender = n.Gender
AND n2.Qty > n.Qty
)
Performance is, despite the verbosity of the SQL, usually on a par with alternatives when using this pattern (often better).
There are other approaches, including using GROUP statements, but personally I find this one more readable and standard cross-DBMS.
USE Saleslogix
DECLARE #AssumedGrowth int
SET #AssumedGrowth = 28
SELECT
account,
employees as NumberIn2013,
#AssumedGrowth += employees as NumberIn2014
FROM sysdba.account
WHERE employees <> 'NULL'
and account like 'Shaw%'
It's telling me that += is invalid and only works with +. Can someone help me with getting this example to work as a compound operator? I don't know if it makes too much difference, but I am using 2005 Management Studio.
Also if it's not a huge pain, adding the same example with #AssumedGrowth being a percentage?
What you are trying to do is this:
SELECT account, employees as NumberIn2013,
(#AssumedGrowth = #AssumedGroup + employees) as NumberIn2014
FROM sysdba.account
WHERE employees <> 'NULL' and account like 'Shaw%';
But, I don't think it will work. I would instead suggest using the built-in capabilities, in particular, row_number():
SELECT account, employees as NumberIn2013,
employees * pow(1 + #AssumedGrowth/100.0, row_number() over (order by <field>) - 1)
FROM sysdba.account
WHERE employees <> 'NULL' and account like 'Shaw%';
Note that you need to specify the ordering for the results. Presumably, there is some sort of id or datetime column that has the appropriate order. Tables represent unordered sets, so there is no "first" row.
I just stumbled upon something in ORACLE SQL (not sure if it's in others), that I am curious about. I am asking here as a wiki, since it's hard to try to search symbols in google...
I just found that when checking a value against a set of values you can do
WHERE x = ANY (a, b, c)
As opposed to the usual
WHERE x IN (a, b, c)
So I'm curious, what is the reasoning for these two syntaxes? Is one standard and one some oddball Oracle syntax? Or are they both standard? And is there a preference of one over the other for performance reasons, or ?
Just curious what anyone can tell me about that '= ANY' syntax.
ANY (or its synonym SOME) is a syntax sugar for EXISTS with a simple correlation:
SELECT *
FROM mytable
WHERE x <= ANY
(
SELECT y
FROM othertable
)
is the same as:
SELECT *
FROM mytable m
WHERE EXISTS
(
SELECT NULL
FROM othertable o
WHERE m.x <= o.y
)
With the equality condition on a not-nullable field, it becomes similar to IN.
All major databases, including SQL Server, MySQL and PostgreSQL, support this keyword.
IN- Equal to any member in the list
ANY- Compare value to **each** value returned by the subquery
ALL- Compare value to **EVERY** value returned by the subquery
<ANY() - less than maximum
>ANY() - more than minimum
=ANY() - equivalent to IN
>ALL() - more than the maximum
<ALL() - less than the minimum
eg:
Find the employees who earn the same salary as the minimum salary for each department:
SELECT last_name, salary,department_id
FROM employees
WHERE salary IN (SELECT MIN(salary)
FROM employees
GROUP BY department_id);
Employees who are not IT Programmers and whose salary is less than that of any IT programmer:
SELECT employee_id, last_name, salary, job_id
FROM employees
WHERE salary <ANY
(SELECT salary
FROM employees
WHERE job_id = 'IT_PROG')
AND job_id <> 'IT_PROG';
Employees whose salary is less than the salary ofall employees with a job ID of IT_PROG and whose job is not IT_PROG:
SELECT employee_id,last_name, salary,job_id
FROM employees
WHERE salary <ALL
(SELECT salary
FROM employees
WHERE job_id = 'IT_PROG')
AND job_id <> 'IT_PROG';
....................
Hope it helps.
-Noorin Fatima
To put it simply and quoting from O'Reilly's "Mastering Oracle SQL":
"Using IN with a subquery is functionally equivalent to using ANY, and returns TRUE if a match is found in the set returned by the subquery."
"We think you will agree that IN is more intuitive than ANY, which is why IN is almost always used in such situations."
Hope that clears up your question about ANY vs IN.
I believe that what you are looking for is this:
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/opt_ops.htm#1005298
(Link found on Eddie Awad's Blog)
To sum it up here:
last_name IN ('SMITH', 'KING',
'JONES')
is transformed into
last_name = 'SMITH' OR last_name =
'KING' OR last_name = 'JONES'
while
salary > ANY (:first_sal,
:second_sal)
is transformed into
salary > :first_sal OR salary >
:second_sal
The optimizer transforms a condition
that uses the ANY or SOME operator
followed by a subquery into a
condition containing the EXISTS
operator and a correlated subquery
The ANY syntax allows you to write things like
WHERE x > ANY(a, b, c)
or event
WHERE x > ANY(SELECT ... FROM ...)
Not sure whether there actually is anyone on the planet who uses ANY (and its brother ALL).
A quick google found this http://theopensourcery.com/sqlanysomeall.htm
Any allows you to use an operator other than = , in most other respect (special cases for nulls) it acts like IN. You can think of IN as ANY with the = operator.
This is a standard. The SQL 1992 standard states
8.4 <in predicate>
[...]
<in predicate> ::=
<row value constructor>
[ NOT ] IN <in predicate value>
[...]
2) Let RVC be the <row value constructor> and let IPV be the <in predicate value>.
[...]
4) The expression
RVC IN IPV
is equivalent to
RVC = ANY IPV
So in fact, the <in predicate> behaviour definition is based on the 8.7 <quantified comparison predicate>. In Other words, Oracle correctly implements the SQL standard here
Perhaps one of the linked articles points this out, but isn't it true that when looking for a match (=) the two return the same thing. However, if looking for a range of answers (>, <, etc) you cannot use "IN" and would have to use "ANY"...
I'm a newb, forgive me if I've missed something obvious...
MySql clears up ANY in it's documentation pretty well:
The ANY keyword, which must follow a comparison operator, means
“return TRUE if the comparison is TRUE for ANY of the values in the
column that the subquery returns.” For example:
SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);
Suppose that there is a row in table t1 containing (10). The
expression is TRUE if table t2 contains (21,14,7) because there is a
value 7 in t2 that is less than 10. The expression is FALSE if table
t2 contains (20,10), or if table t2 is empty. The expression is
unknown (that is, NULL) if table t2 contains (NULL,NULL,NULL).
https://dev.mysql.com/doc/refman/5.5/en/any-in-some-subqueries.html
Also Learning SQL by Alan Beaulieu states the following:
Although most people prefer to use IN, using = ANY is equivalent to
using the IN operator.
Why I always use any is because in some oracle or mssql versions IN list is limited by 1000/999 elements. While = any () is not limited by 1000.
Nobody likes their sql query crashing a web request.
So there is a practical difference.
Second reason it is the more modern form. As it correlates with expressions like > all (...).
Third reason is somehow for me as non-native English speaker it appears more natural to use "any" and "all" than to use IN.