Query complex in Oracle SQL - sql

I have the following tables and their fields
They ask me for a query that seems to me quite complex, I have been going around for two days and trying things, it says:
It is desired to obtain the average age of female athletes, medal winners (gold, silver or bronze), for the different modalities of 'Artistic Gymnastics'. Analyze the possible contents of the result field in order to return only the expected values, even when there is no data of any specific value for the set of records displayed by the query. Specifically, we want to show the gender indicator of the athletes, the medal obtained, and the average age of these athletes. The age will be calculated by subtracting from the system date (SYSDATE), the date of birth of the athlete, dividing said value by 365. In order to avoid showing decimals, truncate (TRUNC) the result of the calculation of age. Order the results by the average age of the athletes.
Well right now I have this:
select person.gender,score.score
from person,athlete,score,competition,sport
where person.idperson = athlete.idathlete and
athlete.idathlete= score.idathlete and
competition.idsport = sport.idsport and
person.gender='F' and competition.idsport=18 and score.score in
('Gold','Silver','Bronze')
group by
person.gender,
score.score;
And I got this out
By adding the person.birthdate field instead of leaving 18 records of the 18 people who have a medal, I'm going to many more records.
Apart from that, I still have to draw the average age with SYSDATE and TRUNC that I try in many ways but I do not get it.
I see it very complicated or I'm a bit saturated from so much spinning, I need some help.

Reading the task you got, it seems that you're quite close to the solution. Have a look at the following query and its explanation, note the differences from your query, see if it helps.
select p.gender,
((sysdate - p.birthday) / 365) age,
s.score
from person p join athlete a on a.idathlete = p.idperson
left join score s on s.idathlete = a.idathlete
left join competition c on c.idcompetition = s.idcompetition
where p.gender = 'F'
and s.score in ('Gold', 'Silver', 'Bronze')
and c.idsport = 18
order by age;
when two dates are subtracted, the result is number of days. Dividing it by 365, you - roughly - get number of years (as each year has 365 days - that's for simplicity, of course, as not all years have that many days (hint: leap years)). The result is usually a decimal number, e.g. 23.912874918724. In order to avoid that, you were told to remove decimals, so - use TRUNC and get 23 as the result
although data model contains 5 tables, you don't have to use all of them in a query. Maybe the best approach is to go step-by-step. The first one would be to simply select all female athletes and calculate their age:
select p.gender,
((sysdate - p.birthday) / 365 age
from person p
where p.gender = 'F'
Note that I've used a table alias - I'd suggest you to use them too, as they make queries easier to read (table names can have really long names which don't help in readability). Also, always use table aliases to avoid confusion (which column belongs to which table)
Once you're satisfied with that result, move on to another table - athlete It is here just as a joining mechanism with the score table that contains ... well, scores. Note that I've used outer join for the score table because not all athletes have won the medal. I presume that this is what the task you've been given says:
... even when there is no data of any specific value for the set of records displayed by the query.
It is suggested that we - as developers - use explicit table joins which let you to see all joins separated from filters (which should be part of the WHERE clause). So:
NO : from person p, athlete a
where a.idathlete = p.idperson
and p.gender = 'F'
YES: from person p join athlete a on a.idathlete = p.idperson
where p.gender = 'F'
Then move to yet another table, and so forth.
Test frequently, all the time - don't skip steps. Move on to another one only when you're sure that the previous step's result is correct, as - in most cases - it won't automagically fix itself.

Related

SQL: Calculate the rating based on different columns and use it as an argument

I'm trying to calculate the rating based on a table that has 3 columns with different ratings ranging from 1 to 5.
I wanted to calculate the average of these 3 values and then be able to use this as an argument in queries, for example:
Where Rating >3.5
At this moment I have this that gives me the average for all suppliers
SELECT c.Name
,(SELECT CAST(AVG(rat) AS DECIMAL(5, 2))
FROM(
VALUES(b.Qty_Price),
(b.Quality),
(b.DeliveryTime)) A (rat)) AS Rating
FROM Order a
JOIN Evaluation b ON b.ID_Evaluation = a.ID_Evaluation
JOIN Supplier c ON c.NIF_Supplier = a.NIF_Supplier
What I would like now is, for example, to filter the providers that have more than 3 ratings, but I don't know how I can do that. If anyone can help i would be grateful
If the query works like you want it, you get the average for all entries, that is.
The WHERE rating > 3.5 cannot be added, as rating does not exist in the context of the SELECT-clause, nor the tables we JOIN.
To overcome this issue, we can keep the query that you have made, call it something different using WITH and SELECT from that sub-query WHERE rating > 3.5
It should look something like this:
WITH Averages(name, rating) AS
(SELECT c.name
,(SELECT CAST(AVG(rat) AS DECIMAL(5, 2))
FROM(
VALUES(b.qty_Price),
(b.quality),
(b.deliveryTime)) AS (rat)) AS rating
FROM Order a
JOIN Evaluation b ON b.ID_Evaluation = a.ID_Evaluation
JOIN Supplier c ON c.NIF_Supplier = a.NIF_Supplier)
SELECT name, rating FROM Averages WHERE rating > 3.5;
Now, we simply call the query you provided as Averages for example, and we SELECT from that table WHERE rating > 3.5.
Also note that you can have multiple WITHs to make things easier for you, but remember that a comma (,) is needed to seperate them. In our case, we only have 1 use of WITH ... AS, so no need for a comma or semi-colon after ...= a.NIF_Supplier)
Looks like you typed only "A" before "(rat)", it should be "AS". Also, remember that attributes should be lowercase, it makes it easier for all of us to distinguish tables from attributes.
Cheers!

Nested subquery in Access alias causing "enter parameter value"

I'm using Access (I normally use SQL Server) for a little job, and I'm getting "enter parameter value" for Night.NightId in the statement below that has a subquery within a subquery. I expect it would work if I wasn't nesting it two levels deep, but I can't think of a way around it (query ideas welcome).
The scenario is pretty simple, there's a Night table with a one-to-many relationship to a Score table - each night normally has 10 scores. Each score has a bit field IsDouble which is normally true for two of the scores.
I want to list all of the nights, with a number next to each representing how many of the top 2 scores were marked IsDouble (would be 0, 1 or 2).
Here's the SQL, I've tried lots of combinations of adding aliases to the column and the tables, but I've taken them out for simplicity below:
select Night.*
,
( select sum(IIF(IsDouble,1,0)) from
(SELECT top 2 * from Score where NightId=Night.NightId order by Score desc, IsDouble asc, ID)
) as TopTwoMarkedAsDoubles
from Night
This is a bit of speculation. However, some databases have issues with correlation conditions in multiply nested subqueries. MS Access might have this problem.
If so, you can solve this by using aggregation with a where clause that chooses the top two values:
select s.nightid,
sum(IIF(IsDouble, 1, 0)) as TopTwoMarkedAsDoubles
from Score as s
where s.id in (select top 2 s2.id
from score as s2
where s2.nightid = s.nightid
order by s2.score desc, s2.IsDouble asc, s2.id
)
group by s.nightid;
If this works, it is a simply matter to join Night back in to get the additional columns.
Your subquery can only see one level above it. so Night.NightId is totally unknown to it hence why you are being prompted to enter a value. You can use a Group By to get the value you want for each NightId then correlate that back to the original Night table.
Select *
From Night
left join (
Select N.NightId
, sum(IIF(S.IsDouble,1,0)) as [Number of Doubles]
from Night N
inner join Score S
on S.NightId = S.NightId
group by N.NightId) NightsWithScores
on Night.NightId = NightsWithScores.NightId
Because of the IIF(S.IsDouble,1,0) I don't see the point is using top.

Writing a query to include certain values but exclude others when looking for a latest time period

I am trying to write a query that looks for a people that have a certain code with the latest period (year) but not if they have another code with that latest period(year). I'll be explicit just so my example makes sense.
I want people who have the code A1,A2,A3,A4,A5 but not AG,AP,AQ. There are people who have an A1 code for a period (like 2014) and an AG code for a the same period. I'd like to exclude them. Not everyone has a code so the field value could be NULL.
Is there a way to express this in a different way (i.e. less characters) than the way I did?
SELECT
people.firstName
FROM
people
WHERE EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (SELECT MAX(period) FROM codes codes2 WHERE codes2.people_id = codes.people_id)
AND code LIKE 'A[1-5]'
)
AND NOT EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (
SELECT MAX(period)
FROM codes codes2
WHERE codes2.people_id = codes.people_id
)
AND code LIKE 'A[GPQ]'
)
Schema is as follows:
People
id (PK)
firstName
Codes
people_id (FK) many to one relation with People table
code (e.g. "A1", "A2", "AG")
period (e.g. "2013", "2014")
There are so many ways you could do that, I'm not an SQL expert but I can't see your query being too bad, if you want to try and reduce the number of sub-queries you could consider using the GROUP BY clause along with a SUM Aggregate function in a HAVING clause.
I started updating your code as follows:
SELECT
people.firstName
FROM
people
LEFT JOIN codes AS a15 ON a15.people_id = people.id AND a15.code LIKE 'A[1-5]'
LEFT JOIN codes AS agpq ON agpq.people_id = people.id AND agpq.code LIKE 'A[GPQ]'
GROUP BY
people.firstName
HAVING
SUM(CASE WHEN a15.code IS NULL THEN 0 ELSE 1 END) > 0
AND SUM(CASE WHEN agpq.code IS NULL THEN 0 ELSE 1 END) = 0
This however doesn't take into account anything to do with period specific requirements described. You could add the period to the GROUP BY clause or add it to a WHERE or one of the JOIN constraints but I'm not quite sure from your description exactly what you're after (I don't believe this is through any fault of your own, I just can't personally align the code provided to the description).
I would also like to point out that the SUM functions above will not give an accurate count of the number of matching codes. This is because if both A[GPQ] and A[1_5] return at least one row, the number returned by each constraint will be multiplied by the number returned for the other, it can however be used to determine if there are "any" returned items as if the criteria is matched it will have a SUM(...) > 0
I'm sure a more experienced SQL Developer / DBA will be able to poke many holes in my proposed query but it might give them or someone else something to work from and hopefully gives you ideas for alternatives to using sub-queries.

Join two queries together with default values of 0 if empty in Access

I have seen some close answers and I have been trying to adapt them to Access 2013, but I can't seem to get it to work. I have two queries:
First query returns
original_staff_data
Month
Year
staff_uid
staff_abbrev
employee_name
staff_salary
It pulls this from tables staff, and salary_by_month and employee_name and number_of_days_at_spec_building (this records where they check in when they work)
transaction_data_by_staff.total
Month
Year
staff_uid
total_revenue
totat_profit
this also pulls information from staff, but sums up over multiple dates in a transaction table creating a cumulative value for each staff_uid so I can't combine the two queries directly.
My problem is I want to create a query that brings results from both. However, not all staff members in Q1 will be in Q2 every day/week/month (vacations, etc) and since I want to ultimately create a final results:
Final_Result
Month
Year
staff_uid
staff_abbrev
employee_name
staff_salary
total_revenue
total_profit
The SQL:
SELECT
original_staff_data.*
, transaction_data_by_staff.total_rev
, transaction_data_by_staff.total_profit
FROM transaction_data_by_staff
RIGHT JOIN original_staff_data
ON (
transaction_data_by_staff.year = original_staff_data.year
AND transaction_data_by_staff.month = original_staff_data.month
) WHERE transaction_data_by_staff.[staff_uid] = [original_staff_data].[staff_uid];
I would like it if there is no revenue or profit that month from that employee, it makes those values 0. I have tried join (specifically RIGHT join with Q1 as the RIGHT join) and it doesn't seem to work, I still only get the subset. There are originally in the original_staff_data query 750 entries so therefore there should be in the final query 750 entries, I am only getting 252, which is the total in transaction_data_by_staff. Any clue on how the ACCESS 2013 SQL should look?
Thanks
Jon
Move the link by stuff_uid to the ON clause, like this:
SELECT original_staff_data.*, transaction_data_by_staff.total_rev, transaction_data_by_staff.total_profit
FROM transaction_data_by_staff RIGHT JOIN original_staff_data ON (transaction_data_by_staff.year = original_staff_data.year) AND (transaction_data_by_staff.month = original_staff_data.month)
AND (((transaction_data_by_staff.[staff_uid])=[original_staff_data].[staff_uid]));

How to optimize group by in table with huge number of records

I have a Person table with huge number of records(for about 16 million), and have a requirement to find all persons, with same lastname, first letter of firstname and birthyear, in other worlds I want to show assuming duplicate persons in UI for users to analyze and decide are there a same person or not.
Here is the query I write
SELECT *
FROM Person INNER JOIN
(
SELECT SUBSTRING(firstName, 1, 1) firstNameF,lastName,YEAR(birthDate) birthYear
FROM Person
GROUP BY SUBSTRING(firstName, 1,1),lastName,YEAR(birthDate)
HAVING count(*) > 1
) as dupPersons
ON SUBSTRING(Person.firstName,1,1) = dupPersons.firstNameF and Person.lastName = dupPersons.lastName and YEAR(Person.birthDate) = dupPersons.birthYear
order by Person.lastName,Person.firstName
but as I am not SQL expert, want too know, is this good way to do that? are there more optimized way?
EDIT
Note that I can cut data, which can have contribution in optimization
for example if I want to cut data by 2 it could return two persons
Johan Smith |
Jane Smith | have same lastname and first name inita
Jack Smith |
Mark Tween | have same lastname and first name inita
Mac Tween |
If the performance using a GROUP BY is not adequate, You could try using an INNER JOIN
SELECT *
FROM Person p1
INNER JOIN Person p2 ON p2.PersonID > p1.PersonID
WHERE SUBSTRING(p2.Firstname, 1, 1) = SUBSTRING(p1.Firstname, 1, 1)
AND p2.LastName = p1.LastName
AND YEAR(p2.BirthDate) = YEAR(p1.BirthDate)
ORDER BY
p1.LastName, p1.FirstName
Well, if you're not an expert, the query you wrote says to me that you're at least pretty competent. When we look at whether a query is "optimized", there are two immediate parts to that: 1. The query just on its own has something notably wrong with it - a bad join, keyword misuse, exploding result set size, supersitions about NOT IN, etc. 2. The context that the query operates within - DB specifics, task specifics, etc.
Your query passes #1, no problem. I would have written it differently - aliased the Person table, used LEFT(P.FirstName, 1) instead of SUBSTRING, and used a CTE (WITH-clause) instead of a subquery. But these aren't optimization issues. Maybe I'd use WITH(READUNCOMMITTED) if the results weren't sensitive to dirty reads. Out of any further context, your query doesn't look like a bomb waiting to go off.
As for #2 - You should probably switch to specifics. Like "I have to run this every week. It takes 17 minutes. How can I get it down to under a minute?" Then people will ask you what your plan looks like, what indexes you have, etc.
Things I'd want to know:
How long does it already take to run?
What's your runtime window? (User & app tolerance for query time.)
Is this run once a day? Week? Month? Quarter?
Do you have the permission to create tables, change current tables, or alter indexes?
Maybe based on having run it, what's the ratio of duplicates you're expecting to find? 5%? 90%?
How stable is the matching criteria requirement?
Example scenario: If this was a run-on-command feature, it will be in my app indefinitely, it will get run weekly, with 10% or fewer records expected to be duplicates, with ability to change the DB how I'd like, if the duplicate matching criteria is firm (not fluctuating), and I wan to cut it from 90s to 5s, I'd create a dedicated BirthYear column (possibly a persisted computed column off of BirthDate), and an index on LastName ASC, BirthYear ASC, FirstName ASC. If too many of those stipulations change, I might to a different direction entirely.
You can try something like this and see the difference on the execution plans, or benchmark the results on performance:
;WITH DupPersons AS
(
SELECT *, COUNT(1) OVER(PARTITION BY SUBSTRING(firstName, 1, 1), lastName, YEAR(birthDate)) Quant
FROM Person
)
SELECT *
FROM DupPersons
WHERE Quant > 1
Of course, it would also help to know your table definition and the indexes you created. I think that maybe it can help to add a computed column with the year of birthdate and create an index on it, the same with the first letter of firstname.