SQL - average distance per day - sql

I have a problem here: "Show average distance per day driven by cars from Paris"
I have also 2 tables referred to this problem
table_cars: id, brand, type, license
table_distances: id_car, date, distance
I have managed to select "the average distance for the cars from Paris"
select avg(table_distances.distance)
from table_distances
INNER JOIN table_cars ON table_distances.id_car = table_cars.id
where table_cars.license = 'Paris';'
Though, I have still a problem with average distance per day. I looked over related questions on the stackoverflow/google but I got more confused.
Can somebody explain how I can improve my query to show average distance per day?

This should get you the distances per car per date.
SELECT id_car, date, AVG(table_distances.distance)
FROM table_distances
INNER JOIN table_cars
ON table_distances.id_car = table_cars.id
WHERE table_cars.license = 'Paris'
GROUP BY id_car, date
ORDER BY id_car, date

Simply add the date to what you select and group by it so it's averaged per row:
SELECT table_distances.date, avg(table_distances.distance)
FROM table_distances
INNER JOIN table_cars ON table_distances.id_car = table_cars.id
WHERE table_cars.license = 'Paris'
GROUP BY table_distances.date

Simply add GROUP BY clause to your DATE column
Reference for SQL GROUP BY clause
SELECT AVG(td.distance)
FROM table_distances td
INNER JOIN table_cars tc ON td.id_car = tc.id
WHERE tc.license = 'Paris'
GROUP BY td.date;
Then you have to get average distance of a car per day

Related

How to only include full months - Postgres

I have the following SQL query:
SELECT
SUM(amount)
FROM
(SELECT
l.human_readable_id,
DATE_TRUNC('day', c.created_date)::TIMESTAMP AS Date,
(ROUND(c.amount/100.00, 2))::DOUBLE PRECISION AS amount,
(ROUND(c.amount/100.00, 2)*0.04)::DOUBLE PRECISION AS Repayment,
c.currency,
c.payment_type,
c.status,
c.payment_id
FROM
loan_applications AS l
LEFT JOIN
merchants AS m ON l.merchant_id = m.id
LEFT JOIN
codat_companies AS cc ON m.id = cc.merchant_id
LEFT JOIN
codat_commerce_payments AS c ON cc.id = c.codat_company_id
WHERE
amount IS NOT NULL) AS subquery
GROUP BY
date
And get the sum of every month. Based on this, I can calculate the average. Is it possible to only include full months? For instance this is data from 1st of May 2021 until yesterday. But including this month would have a negative impact on the overall monthly average.
Thanks in advance
You can add a condition in the WHERE clause :
WHERE amount IS NOT NULL
AND c.created_date < date_trunc('month', Now())

How to apply two where condition one inside subquery and the other out of subquery?

Find the customer that spend less than 3$ on individual film rentals, but has spent a total higher than 15?
The query that I wrote is given below
SELECT CustomerID,Customer.CustomerFirstName,Customer.CustomerSurname,Total FROM (SELECT DISTINCT Customer.CustomerID,Customer.CustomerFirstName,Customer.CustomerSurname,sum(([Rental].[Quantity])*([Film].[FilmPrice])) AS Total
FROM Film RIGHT JOIN (Customer INNER JOIN Rental ON Customer.CustomerID = Rental.CustomerIDFk) ON Film.FilmID = Rental.FlimIDFk
GROUP BY Customer.CustomerID,Customer.CustomerFirstName,Customer.CustomerSurname) T WHERE Total>15;
Now how can I apply the second condition which is FilmPrice<3
Please help me out.
This is the ERD
Thanks
Simply add another where clause inside the subquery, after your join on but before the group by for WHERE FilmPrice<3
You want to find the customers where the maximum price of a film they have rented is less than 3 and the total price of all the films they have rented is greater than 15.
Something like this:
SELECT CustomerID,
CustomerFirstName,
CustomerSurname
FROM Customer
WHERE CustomerId IN (
SELECT r.CustomerIDFk
FROM Rental r
INNER JOIN Film f
ON ( f.FilmId = e.FilmIdFk )
GROUP BY r.CustomerIdFk
HAVING MAX(f.FilmPrice) < 3
AND SUM(f.FilmPrice * r.Quantity)>15
)

SELECT table 2 insite a COUNT comparing to table 1

I have four tables: SCHED_FLIGHT, AIRCRAFT, PLANETYPE and RESERVATIONS. I want to do something like this:
select S.SCHED_NO as "Scheduled Flight Number", P.CAPACITY - COUNT (SELECT * FROM RESERVATIONS R WHERE R.SCHED_NO = S.SCHED_NO) "Remaining Seats"
from PLANETYPE P, AIRCRAFT A, SCHED_FLIGHT S
where S.SERIAL_NO = A.SERIAL_NO and A.TYPE_NO = P.TYPE_NO;
As you can see, each SCHEDULED_FLIGHT has a AIRCRAFT SERIAL_NO, each AIRCRAFT has a PLANETYPE TYPE_NO and each PLANETYPE has a different capacity, so, each SCHED_FLIGHT has a capacity based on the plane and I want to get the number of remaining seats counting the number of RESERVATIONS made to that flight.
Of course the code doesn't work, but I have no idea how to solve this problem. Any tips?
edit: I've received some answers, but first I was just showing two tables as an example - but actually there's 4 tables involved and I would have to include a where clause or multiple joins... so I'm still confused. What should I do now? Look at my code.
Update:
I'm not sure that all of columns correspond to your real columns, but this query might look like this:
SELECT
SF.SCHED_NO AS 'Scheduled Flight Number',
SF.CAPACITY - COUNT (R.SCHED_NO) AS 'Remaining Seats'
FROM
(
SELECT
SF.SCHED_NO,
P.CAPACITY
FROM SCHEDULED_FLIGHT SF
INNER JOIN AIRCRAFT A
ON SF.AIRCRAFT_SERIAL_NO = A.SERIAL_NO
INNER JOIN PLANETYPE P
ON A.PLANETYPE_NO = P.PLANETYPE_NO
) SF
LEFT JOIN RESERVATIONS R
ON SF.SCHED_NO = R.SCHED_NO
GROUP BY SF.SCHED_NO, SF.CAPACITY
I would be inclined to approach this as an aggregation with a join. First, aggregate the reservations by flight number. Then join that back to FLIGHT and calculate the remaining capacity:
SELECT F.FLIGHT_NO, (F.CAPACITY - COALESCE(cnt, 0)) as Remaining
FROM FLIGHT F LEFT JOIN
(SELECT R.FLIGHT_NO, COUNT(*) as cnt
FROM RESERVATIONS R
GROUP BY R.FLIGHT_NO
) R
ON R.FLIGHT_NO = F.FLIGHT_NO;

Keeping rows from double-counting in a GROUP BY

Here's the basic guts of my schema and problem: http://sqlfiddle.com/#!1/72ec9/4/2
Note that the periods table can refer to a variable range of time - it could be an entire season, it could be a few games or one game. For a given team and year all period rows represent exclusive ranges of time.
I've got a query written which joins up tables and uses a GROUP BY periods.year to aggregate scores for a season (see sqlfiddle). However, if a coach had two positions in the same year the GROUP BY will count the same period row twice. How can I ditch the duplicates when a coach held two positions but still sum up periods when a year is comprised of multiple periods? If there's a better way to do the schema I'd also appreciate it if you pointed it out to me.
The underlying problem (joining to multiple tables with multiple matches) is explained in this related answer:
Two SQL LEFT JOINS produce incorrect result
To fix, I first simplified & formatted your query:
select pe.year
, sum(pe.wins) AS wins
, sum(pe.losses) AS losses
, sum(pe.ties) AS ties
, array_agg(po.id) AS position_id
, array_agg(po.name) AS position_names
from periods_positions_coaches_linking pp
join positions po ON po.id = pp.position
join periods pe ON pe.id = pp.period
where pp.coach = 1
group by pe.year
order by pe.year;
Yields the same, incorrect result as your original, but simpler / faster / easier to read.
No point in joining the table coach as long as you don't use columns in the SELECT list. I removed it completely and replaced the WHERE condition with where pp.coach = 1.
You don't need COALESCE. NULL values are ignored in the aggregate function sum(). No need to substitute 0.
Use table aliases to make it easier to read.
Next, I solved your problem like this:
SELECT *
FROM (
SELECT pe.year
, array_agg(DISTINCT po.id) AS position_id
, array_agg(DISTINCT po.name) AS position_names
FROM periods_positions_coaches_linking pp
JOIN positions po ON po.id = pp.position
JOIN periods pe ON pe.id = pp.period
WHERE pp.coach = 1
GROUP BY pe.year
) po
LEFT JOIN (
SELECT pe.year
, sum(pe.wins) AS wins
, sum(pe.losses) AS losses
, sum(pe.ties) AS ties
FROM (
SELECT period
FROM periods_positions_coaches_linking
WHERE coach = 1
GROUP BY period
) pp
JOIN periods pe ON pe.id = pp.period
GROUP BY pe.year
) pe USING (year)
ORDER BY year;
Aggregate positions and periods separately before joining them.
In the first sub-query po list positions only once with array_agg(DISTINCT ...).
In the second sub-query pe ...
GROUP BY period, because a coach can have multiple positions per period.
JOIN to periods-data after that, and then aggregate to get sums.
db<>fiddle here
Old sqlfiddle
use distinct as shown here
code:
select periods.year as year,
sum(coalesce(periods.wins, 0)) as wins,
sum(coalesce(periods.losses, 0)) as losses,
sum(coalesce(periods.ties, 0)) as ties,
array_agg( distinct positions.id) as position_id,
array_agg( distinct positions.name) as position_names
from periods_positions_coaches_linking
join coaches on coaches.id = periods_positions_coaches_linking.coach
join positions on positions.id = periods_positions_coaches_linking.position
join periods on periods.id = periods_positions_coaches_linking.period
where coaches.id = 1
group by periods.year, positions.id
order by periods.year;
In your case, the easiest way might be to divide out the positions:
select periods.year as year,
sum(coalesce(periods.wins, 0))/COUNT(distinct positions.id) as wins,
sum(coalesce(periods.losses, 0))/COUNT(distinct positions.id) as losses,
sum(coalesce(periods.ties, 0))/COUNT(distinct positions.id) as ties,
array_agg(distinct positions.id) as position_id,
array_agg(distinct positions.name) as position_names
from periods_positions_coaches_linking join
coaches
on coaches.id = periods_positions_coaches_linking.coach join
positions
on positions.id = periods_positions_coaches_linking.position join
periods
on periods.id = periods_positions_coaches_linking.period
where coaches.id = 1
group by periods.year
order by periods.year;
The number of positions scales the wins, losses, and ties, so dividing it out adjusts the counts.

How do I select the Max in this query? Help for exam

So, I'm going thru a lot of exercises for a final SQL exam I have on thursday and I came across another query I'm having doubts about.
The tables in the exercise are supposed to be from a hotel DB. You have three tables involved:
STAY ROOM ROOM_TYPE
=========== ============ ============
PK ID_STAY PK ID_ROOM PK ID_ROOM_TYPE
DAYS_QUANT ID_ROOM_TYPE FK DESCRIPTION
DATE PRICE
ID_ROOM FK
The query they are asking me to do is "Show all data for the Room that has been rented for the highest amount of days (in total) in 2011, by room type (you have to show ID Room Type and Description)"
This is the way I solved it, I don't know if it's ok:
SELECT RT.ID_ROOM_TYPE, RT.DESCRIPTON, R.*, SUM(S.DAYS_QUANT)
FROM STAY S, ROOM R, ROOM_TYPE RT
WHERE YEAR(S.DATE) = '2011'
GROUP BY RT.ID_ROOM_TYPE, RT.DESCRIPTON, R.*
ORDER BY SUM(S.DAYS_QUANT) DESC
LIMIT 1
So, the first thing I'm not sure about, is that R.* I included. Can I put it like that in a SELECT? Can it also be included like that in a GROUP BY?
The other thing I'm not sure about if I will be allowed to use LIMIT or SELECT TOP 1 statements in the exam. Can anyone think of a way to solve this without using those? like with a MAX() statement or something?
I believe that you are not allowed to use CTEs so I expanded last part of Steve Kass's answer. You may get desired results without TOP or Limit clauses by comparing total days a room was occupied by max total number of days any room of the same type was occupied. To do so, you would first sum days by room and then, using identical derived table, get maximum of days per room type. Joining the two by room type and days you would isolate most used rooms. Then you join starting tables to show all the data. Unlike TOP or Limit this will produce more records in case of a tie.
P.S. this is NOT tested. I believe it will work, but there might be a typo.
select r.*, rt.*, roomDays.TotalDays
from Room r inner join Room_type rt
on r.id_room_type = rt.id_room_type
inner join
(select id_room, id_room_type, sum(days_quant) TotalDays
from Stay
inner join Room
on Stay.id_room = Room.id_room
where year(Date) = 2011
group by id_room, id_room_type) roomDays
on r.id_room = roomDays.id_room
inner join
(select id_room_type, max(TotalDays) TotalDays
from
(select id_room, id_room_type, sum(days_quant) TotalDays
from Stay
inner join Room
on Stay.id_room = Room.id_room
where year(Date) = 2011
group by id_room, id_room_type) roomDaysHelper
group by id_room_type) roomTypeDays
on r.id_room_type = roomTypeDays.id_room_type
and roomDays.TotalDays = roomTypeDays.TotalDays
select r.*, t.*
from room r
join room_type t on t.id_room_type = r.id_room_type
where r.id in
(select
(select r.id_room
from room r
join stay on stay.id_room = r.id_room
where year(s.date) = '2011'
and r.id_room_type = t.id_room_type
group by r.id_room
order by sum(s.days_quant) desc
limit 1) room_id
from room_type t)
It's always possible to avoid LIMIT 1 or SELECT TOP. One way is to express the top row as the row for which there is no higher row. WHERE NOT EXISTS expresses the idea of "for which there is no."
One way to think of this is as follows: Select those rooms (along with their total days and type information) for which there is no room of the same type with a greater number of total days. That gives you this query (not carefully proofread):
with StayTotals as (
select
STAY.ID_ROOM,
ROOM_TYPE.ID_ROOM_TYPE,
ROOM_TYPE.DESCRIPTION,
SUM(STAY.DAYS_QUANT) AS TotalDays2011
from STAY join ROOM on STAY.ID_ROOM = ROOM.ID_ROOM
join ROOM_TYPE on ROOM.ID_ROOM_TYPE = ROOM_TYPE.ID_ROOM_TYPE
where YEAR(STAY.DATE) = 2011
group by STAY.ID_ROOM, ROOM_TYPE.ID_ROOM_TYPE, ROOM_TYPE.DESCRIPTION
)
select *
from StayTotals as T1
where not exists (
select *
from StayTotals as T2
where T2.ID_ROOM_TYPE = T1.ID_ROOM_TYPE
and T2.TotalDays2011 > T1.TotalDays2011
);
If you can't use CTEs (the WITH clause), you can rewrite it using subqueries, but it's awkward.
Ranking functions have been part of the SQL standard for quite a while. If you can use them, this may also work:
with StayTotals as (
select
STAY.ID_ROOM,
ROOM_TYPE.ID_ROOM_TYPE,
ROOM_TYPE.DESCRIPTION,
SUM(STAY.DAYS_QUANT) AS TotalDays2011
from STAY join ROOM on STAY.ID_ROOM = ROOM.ID_ROOM
join ROOM_TYPE on ROOM.ID_ROOM_TYPE = ROOM_TYPE.ID_ROOM_TYPE
where YEAR(STAY.DATE) = 2011
group by STAY.ID_ROOM, ROOM_TYPE.ID_ROOM_TYPE, ROOM_TYPE.DESCRIPTION
), StayTotalsRankedByType as (
select
ID_ROOM,
ID_ROOM_TYPE,
DESCRIPTION,
TotalDays2011,
RANK() OVER (
PARTITION BY ID_ROOM_TYPE
ORDER BY TotalDays2011 DESC
) as RankInRoomType
from StayTotals
)
select
ID_ROOM,
ID_ROOM_TYPE,
DESCRIPTION,
TotalDays2011
from StayTotalsRankedByType
where RankInRoomType = 1;
Finally, one other way to pull in additional columns to describe the grouped MAX results is to use a "carryalong" sort, which was a handy technique before ranking functions were available. Adam Machanic gives an example here, and there are useful threads on the topic from Usenet, such as this one.
How about this?
select room.id_room, room_type.description, room.price
from room inner join room_type
on room.id_room.type = room_type.id_room_type
where room.room_id = (select room_id from stay
where year (date) = 2011
group by id_room
order by sum (days_quant) desc);
Unfortunately, this query (as it is now) doesn't show how for many days the most popular room had been rented. But there's no 'limit 1'!
Thank you all! with all the ideas you gave me I came up with this, let me know if you think it's ok please!
SELECT R.ID_ROOM, R.ID_ROOM_TYPE, T.DESCRIPTION, SUM(S.DAYS_CUANT)
FROM ROOM R, ROOM_TYPE T, STAY S
(SELECT ID_STAY, SUM(S.DAYS_QUANT) TOTALDAYS
FROM STAY S
WHERE YEAR(S.DATE) = 2011
GROUP BY S.ID_STAY) STAYHELPER
WHERE YEAR(S.DATE) = 2011
GROUP BY R.ID_ROOM, R.ID_ROOM_TYPE, T.DESCRIPTION
HAVING SUM(S.DAYS_QUANT) >= ALL STAYHELPER.TOTALDAYS