How to use count() without the output changing when I use join -Postgresql - sql

I need to find the sum of all the rental days of a car dealership (including
period_begin and period_end, not unique) for all cars of that department.Divided by the total number of different (unique) employees that department ever had.
I have 5 tables
Department(PK departmentnr, name, FK postcode, FK place_name)
Employee(PK employeenr, FK email)
Contract(PK periode_begin, PK periode_end, FK departmentnr, FK employeenr)
registerform (,PK periode_end,PK,periode_end,FKemail,FK,
Fk numberplate,periode_end,periode_begin)
car(PK numberplate,FK departmentnr,FK Brand,FK model)
when I go step by step
part 1
The total employees per department
select departmentnr,Count(employeenr)FROM contract
group by departmentnr
part 2
the amount of days the cars were hired
SELECT DISTINCT departmentnr,
Sum((( Date(periode_end) - Date(periode_begin) + 1 ))) AS
average
FROM registerform r
INNER JOIN car w using(numberplate)
GROUP BY departmentnr
I get the correct ouput but when I try to get these 2 together
SELECT distinct departmentnr,
(
sum(((date(r.periode_end) - date(r.periode_begin) + 1))) / (
select
count(employeenr))
)
as average
from
registerform r
inner join
car w using(numberplate)
inner join
contract using(departmentnr)
inner join
employee using(employeenr)
group by
departmentnr
then my output gets absurd.
How can I fix this and is there a way to make the code more efficient.

Aggregated before you JOIN. So, one method is:
SELECT c.departmentnr, co.num_employees,
Sum( Date(r.periode_end) - Date(r.periode_begin) + 1 ) AS average
FROM registerform r JOIN
car c
USING (numberplate) LEFT JOIN
(SELECT co.departmentnr, Count(*) as num_employees
FROM contract co
GROUP BY co.departmentnr
) co
ON co.departmentnr = c.departmentnr
GROUP BY c.departmentnr, co.num_employees;

Related

Extracting several math operations outputs from single select query

I have three tables that I need to merge to analyse: active, students and bills.
'Active' contains records on active students and the subjects they have been active on with columns: id (student id) int, time (time they have been active) timestamp, and subject (subject in which were active) - text
id time subject
1 2020-04-23 06:53:30 Math
2 2020-05-13 09:51:22 Physics
2 2020-02-26 17:34:56 History
'Students' is the mass database containing: id (student id) int, group (the group to which student was assigned for a/b test) - text
id group
1 A
2 B
3 A
4 A
'Bills' keeps record of all transactions for courses that student purchased: id (student id) int, sale_time (time when student purchased course) timestamp, subject (subject in which course purchased) text, money (amount paid).
id sale_time subject money
1 2020-03-04 08:54:55 Math 4300
1 2020-04-08 20:43:56 Math 3200
2 2020-05-09 13:43:12 Law 8900
Basically, we have a student database (Students) some of which purchased courses (Bills). While some of those who purchased remain active (Active).
I need to write ONE SINGLE query where I can extract the following grouped by whether they belong to A or B group:
average revenue per user: sum (money) / count (distinct Students.id)
average revenue per active user: sum (money) / count (distinct Active.id)
conversion rate (%): count (distinct Bills.id) / count (distinct Students.id)
conversion rate (active) (%): count (distinct Bills.id) / count (distinct Active.id)
conversion rate (Math) (%) (count (distinct Bills.id) where Bills.subject = Math) / (count (distinct Active.id) where Active.subject = Math)
All these in single query!
I used
select sum (money)/count (distinct Students.id)
from Students
left join Bills using (id)
left join Active using (id)
group by group, Students.id
but I don't know how to do these math calculations all in one right after select with filters.
Please help!
SQL fiddle: https://www.db-fiddle.com/f/NPQR6aBf8H36XvrefJY2J/0
All You need is this:
select s.[group], sum (money)/ NULLIF( count (distinct s.id),0) as
AvgPerUser,
sum (money) / NULLIF(count (distinct a.id),0) as AvgActUser,
count (distinct b.id) / NULLIF(count (distinct a.id),0) as CovRate,
count (distinct b.id) / NULLIF(count (distinct a.id),0) as ConActRate,
(select count(distinct b2.id) from Bills as b2 where b2.subject = 'Math') /
NULLIF((select count ( distinct a2.id) from Active as a2 where a2.subject
='Math'),0) as ConRateMath
from Students as s
left join Bills as b on s.id = b.id
left join Active as a on s.id = a.id
group by s.[group]
I would recommend removing duplicates before joining and then using window functions:
select s.group, avg(b.money)as AvgPerUser,
sum(b.money) / nullif(count(a.id), 0) as AvgActUser,
count(b.id) / nullif(count(s.id), 0) as CovRate,
count(b.id) / nullif(count(a.id),0) as ConActRate,
count(b.id) filter (where s.subject = 'Math') * 1.0 / count(*) filter (where s.subject = 'Math') as ConRateMath
from Students s left join
(select b.id, sum(money) as money
from bills b
group by b.id
) b
on s.id = b.id left join
(select distinct a.id from active a
) a
on s.id = a.id
group by s.group;
Note: I don't think you want s.id in the GROUP BY. That really would not be aggregating anything.

SELECT * FROM table in addition of aggregation function

Short context:
I would like to show a list of all companies except if they are in the sector 'defense' or 'government' and their individual total spent on training classes. Only the companies that have this total amount above 1000 must be shown.
So I wrote the following query:
SELECT NAME, ADDRESS, ZIP_CODE, CITY, SUM(FEE-PROMOTION) AS "Total spent on training at REX"
FROM COMPANY INNER JOIN PERSON ON (COMPANY_NUMBER = EMPLOYER) INNER JOIN ENROLLMENT ON (PERSON_ID = STUDENT)
WHERE SECTOR_CODE NOT IN (SELECT CODE
FROM SECTOR
WHERE DESCRIPTION = 'Government' OR DESCRIPTION = 'Defense')
GROUP BY NAME, ADDRESS, ZIP_CODE, CITY
HAVING SUM(FEE-PROMOTION) > 1000
ORDER BY SUM(FEE-PROMOTION) DESC
Now what I actually need is, instead of defining every single column in the COMPANY table, I would like to show ALL columns of the COMPANY table using *.
SELECT * (all tables from COMPANY here), SUM(FEE-PROMOTION) AS "Total spent on training at REX"
FROM COMPANY INNER JOIN PERSON ON (COMPANY_NUMBER = EMPLOYER) INNER JOIN ENROLLMENT ON (PERSON_ID = STUDENT)
WHERE SECTOR_CODE NOT IN (SELECT CODE
FROM SECTOR
WHERE DESCRIPTION = 'Government' OR DESCRIPTION = 'Defense')
GROUP BY * (How to fix it here?)
HAVING SUM(FEE-PROMOTION) > 1000
ORDER BY SUM(FEE-PROMOTION) DESC
I could define every single column from COMPANY in the SELECT and that solution will do the job (as in the first example), but how can I make the query shorter using "SELECT * from the table COMPANY"?
The key idea is to summarize in the subquery to get the total spend for the company. This allows you to remove the aggregation from the outer query:
select c.*, pe.total_spend
from company c join
sector s
on c.sector_code = s.code left join
(select p.employer, sum(e.fee - e.promotion) as training_spend
from person p join
enrollment e
on p.person_id = e.student
group by p.employer
) pe
on pe.employer = c.company_number
where s.sector not in ('Government', 'Defense') and
pe.total_spend > 1000

Is there an easier way to figure the query out

I have a movie table which has year and movie details like title , movie id( mid) and a table m_cast where i have all the actors in that movie.
I would like to get all the actors who have never been unemployed for more than 3 years. ( Assuming actors are unemployed between two consecutive movies)
i code i came up with is
select a.yr1 y1 , b.yr2 y2 , a.yr1 - b.yr2 diff from
(select substr(substr(trim(year),-5),0,5) yr1 , * from movie m inner join m_cast p on m.mid = p.mid order by pid , yr1) a ,
(select substr(substr(trim(year),-5),0,5) yr2 , * from movie m inner join m_cast p on m.mid = p.mid order by pid, yr2) b on a.yr1 > b.yr2
where not exists
(select count(*) from movie m inner join m_cast p on m.mid = p.mid
and cast(substr(substr(trim(year),-5),0,5) as integer) < a.yr1 and cast(substr(substr(trim(year),-5),0,5) as integer) > b.yr2)
Self join itself takes a lot of time. And lag and lead functions do not work in SQLite version i am using.
I'm assuming the movie table has a column called year, and a column to identify the actor's name. Something like : year int, actorId int
The fastest way to run your query is to filter the last 3 years from your movie table and then to group by your actors the distinct count of years.
Example after filtering
ActorId Year
1. 2018
1. 2018
1. 2017
2. 2016
2. 2017
2. 2018
Then group by and select distinct :
Select actorId from movieTable group by actorId having count (distinct (Year)) =3
And that will only return the actors who have worked in the last 3 years. Once you have your actors id's filtered out in that column do a join to the table that holds their names.
Sorry about the format of my writing - did it from my cellphone.
Regards,
Jorge D. Lopez

Oracle sql - referencing tables

My school task was to get names from my movie database actors which play in movies with highest ratings
I made it this way and it works :
select name,surname
from actor
where ACTORID in(
select actorid
from actor_movie
where MOVIEID in (
select movieid
from movie
where RATINGID in (
select ratingid
from rating
where PERCENT_CSFD = (
select max(percent_csfd)
from rating
)
)
)
);
the output is :
Gary Oldman
Sigourney Weaver
...but I'd like to also add to this select mentioned movie and its rating. It accessible in inner selects but I don't know how to join it with outer select in which i can work just with rows found in Actor Table.
Thank you for your answers.
You just need to join the tables properly. Afterwards you can simply add the columns you´d like to select. The final select could be looking like this.
select ac.name, ac.surname, -- go on selecting from the different tables
from actor ac
inner join actor_movie amo
on amo.actorid = ac.actorid
inner join movie mo
on amo.movieid = mo.movieid
inner join rating ra
on ra.ratingid = mo.ratingid
where ra.PERCENT_CSFD =
(select max(percent_csfd)
from rating)
A way to get your result with a slightly different method could be something like:
select *
from
(
select name, surname, percent_csfd, row_number() over ( order by percent_csfd desc) as rank
from actor
inner join actor_movie
using (actorId)
inner join movie
using (movieId)
inner join rating
using(ratingId)
(
where rank = 1
This uses row_number to evaluate the "rank" of the movie(s) and then filter for the movie(s) with the highest rating.

Query with aggregating data from 2 tables

I have three tables in my Oracle db:
Peoples:
IdPerson PK
Name
Surname
Earnings:
IdEarning
IdPerson
EarningValue
Awards:
IdAward PK
IdPerson FK
AwardDescription
A person can have many earnings.
An earning can have many or no any earnings.
A person can have many awards, one award, or no any award.
I want to make a query that will return 3 columns:
Surname
SUM of all EarningValue of person with this surname
COUNT of all Awards for this person
An important thing is that i also need to display: 0 value if person don't have any award or earning. There is a possibility that person have an award but don't have any earning.
Is it possible to make such query?
SELECT p.IdPerson,
p.Surname,
NVL(SUM(e.EarningValue), 0) as SumEarnings,
COUNT(a.IdAward) as CntAwards
FROM Peoples p
LEFT JOIN Earnings e ON p.IdPerson = e.IdPerson
LEFT JOIN Awards a ON p.IdPerson = a.IdPerson
GROUP BY p.IdPerson,
p.Surname;
What returns this:
SELECT p.Surname,
(SELECT NVL(SUM(e.EarningValue), 0)
FROM Earnings e WHERE e.IdPerson = p.IdPerson) as SumEarnings,
(SELECT COUNT(aIdAward)
FROM Awards a WHERE a.IdPerson = p.IdPerson) as CntAwards
FROM Peoples p
yes of-course it is possible.
you just need to join the tables using multiple join queries and then apply sum() function to get sum of earnings and Count() to count the no. of awards.
Try the below query:
SELECT P.Surname,
Sum(E.EarningValue)AS Total_Earnings,
Count(A.IdAward) AS total_awards
FROM Peoples P
LEFT JOIN Earnings E
ON P.IdPerson = E.IdPerson
LEFT JOIN Awards A
ON A.IdPerson = P.IdPerson
GROUP BY P.IdPerson;
Yes it is possible, you just have to join the tables
SELECT Peoples.Surname, SUM(Earnings.EarningValue) as Earnings, COUNT(Awards. IdPerson) as Awards
FROM Peoples
INNER JOIN Earnings
ON Peoples.IdPerson = Earnings.IdPerson
INNER JOIN Awards
ON Peoples.IdPerson = Awards.IdPerson
GROUP BY Peoples.IdPerson;