SQL Select Query Group By With Key (Join or SubQuery) - sql

Trying to wrap my head around this query but here it is...
Table: TVEpisode
Columns: TVEpisodeID (PK), TVSeriesID, season (number), episode (number), watched (0 or 1)
What I am looking to get is the first unwatched (value 0) episode for each TVSeries. For example, if I have watched all of season 1 for a TVSeriesID (45) and my lasted watched episode is season 2 episode 5, I want the query to return:
TVEpisodeID | TVSeriesID | Season | Episode
PK | 45 | 2 | 6
Need that result for each TVSeries

In most databases, you would do this with the ANSI standard window functions:
select tve.*
from (select tve.*
row_number() over (partition by tvseriesid order by season, episode) as seqnum
from tvepisode tve
where tve.watched = 0
) tve
where seqnum = 1;
I assume that "first" is referring to the combination of season and episode.

This should give you first watched=0 of all season/episode combination
SELECT *
FROM TVEpisode
WHERE TVEpisodeID IN
( SELECT min(TVEpisodeID)
FROM TVEpisode
WHERE watched=0
GROUP BY TVSeriesID) t

Related

SQL query (Postgres) how to answer that?

I have a table with company id's (non unique) and some attribute (let's call it status id), status can be between 1 to 18 (many to many the row id is what unique)
now I need to get results of companies who only have rows with 1 and 18, if they have any number as well (let's say 3) then this company should not be returned.
The data is stored as row id, some meta data, company id and one status id, the example below is AFTER I ran a group by query.
So as an example if I do group by and string agg, I am getting these values:
Company ID Status
1 1,9,12,18
2 12,13,18
3 1
4 8
5 18
So in this case I need to return only 3 and 5.
You should fix your data model. Here are some reasons:
Storing numbers in strings is BAD.
Storing multiple values in a string is BAD.
SQL has poor string processing capabilities.
Postgres offers many ways to store multiple values -- a junction table, arrays, and JSON come to mind.
For your particular problem, how about an explicit comparison?
where status in ('1', '18', '1,18', '18,1')
You can group by companyid and set 2 conditions in the having clause:
select companyid
from tablename
group by companyid
having
sum((status in (1, 18))::int) > 0
and
sum((status not in (1, 18))::int) = 0
Or with EXCEPT:
select companyid from tablename
except
select companyid from tablename
where status not in (1, 18)
See the demo.
Results:
> | companyid |
> | --------: |
> | 3 |
> | 5 |
You can utilize group by and having. ie:
select *
from myTable
where statusId in (1,18)
and companyId in (select companyId
from myTable
group by companyId
having count(distinct statusId) = 1);
EDIT: If you meant to include those who have 1,18 and 18,1 too, then you could use array_agg instead:
select *
from t t1
inner join
(select companyId, array_agg(statusId) as statuses
from t
group by companyId
) t2 on t1.companyid = t2.companyid
where array[1,18] #> t2.statuses;
EDIT: If you meant to get back only companyIds without the rest of columns and data:
select companyId
from t
group by companyId
having array[1,18] #> array_agg(statusId);
DbFiddle Demo

Select distinct value and bring only the latest one

I have a table that stores different statuses of each transaction. Each transaction can have multiple statuses (pending, rejected, aproved, etc).
I need to build a query that brings only the last status of each transaction.
The definition for the table that stores the statuses is:
[dbo].[Cuotas_Estado]
ID int (PK)
IdCuota int (references table dbo.Cuotas - FK)
IdEstado int (references table dbo.Estados - FK)
Here's the architecture for the 3 tables:
When running a simple SELECT statement on table dbo.Cuotas_Estado you'll get:
SELECT
*
FROM [dbo].[Cuotas_Estado] [E]
But the result I need is:
IdCuota | IdEstado
2 | 1
3 | 2
9 | 3
10 | 3
11 | 4
I'm running the following select statement:
SELECT
DISTINCT([E].[IdEstado]),
[E].[IdCuota]
FROM [dbo].[Cuotas_Estado] [E]
ORDER BY
[E].[IdCuota] ASC;
This will bring this result:
So, as you can see, it's bringing a double value to entry 9 and entry 11, I need the query to bring only the latest IdEstado column (3 in the entry 9 and 4 in the entry 11).
can you try this?
with cte as (
select IdEstado,IdCuota,
row_number() over(partition by IdCuota order by fecha desc) as RowNum
from [dbo].[Cuotas_Estado]
)
select IdEstado,IdCuota
from cte
where RowNum = 1
You can use a correlated subquery:
SELECT e.*
FROM [dbo].[Cuotas_Estado] e
WHERE e.IdEstado = (SELECT MAX(e2.IdEstado)
FROM [dbo].[Cuotas_Estado] e2
WHERE e2.IdCuota = e.IdCuota
);
With an index on Cuotas_Estado(IdCuota, IdEstado) this is probably the most efficient method.

Finding the most recent date in SQL for a range of rows

I have a table of course work marks, with the table headings:
Module code, coursework numbers, student, date submitted, mark
Sample data in order of table headings:
Maths, 1, Parry, 12-JUN-92, 20
Maths, 2, Parry, 13-JUN-92, 20
Maths, 2, Parry, 15-JUN-92, 25
Expected data after query
Maths, 1, Parry, 12-JUN-92, 20
Maths, 2, Parry, 15-JUN-92, 25
Sometimes a student retakes an exam and they have an additional row for a piece of coursework.
I need to try get only the latest coursework’s in a table. The following works when I isolate a particular student:
SELECT *
FROM TABLE
WHERE NAME = ‘NAME’
AND DATE IN (SELECT MAX(DATE)
FROM TABLE
WHERE NAME = ‘NAME’
GROUP BY MODULE_CODE, COURSEWORK_NUMBER, STUDENT)
This provides the correct solution for that person, giving me the most recent dates for each row (each coursework) in the table. However, this:
SELECT *
FROM TABLE
AND DATE IN (SELECT MAX(DATE)
FROM TABLE
GROUP BY MODULE_CODE, COURSEWORK_NUMBER, STUDENT)
Does not provide me with the same table but for every person who has attempted the coursework. Where am I going wrong? Sorry if the details are a bit sparse, but I’m worried about plagiarism.
Working with SQL plus
This is a good spot to use Oracle keep syntax:
select
module_code,
course_work_number,
student,
max(date_submitted) date_submitted,
max(mark) keep(dense_rank first order by date_submitted desc) mark
from mytable
group by module_code, course_work_number, student
Demo on DB Fiddle:
MODULE_CODE | COURSE_WORK_NUMBER | STUDENT | DATE_SUBMITTED | MARK
:---------- | -----------------: | :------ | :------------- | ---:
Maths | 1 | Parry | 12-JUN-92 | 20
Maths | 2 | Parry | 15-JUN-92 | 25
You are looking for a groupwise maximum. See this article from MySQL:
https://dev.mysql.com/doc/refman/8.0/en/example-maximum-column-group-row.html
I'm not sure about the correct syntax for Oracle, but it should be similar. At least the query structure should put you on the right path.
You could use the row_number function to solve this:
select x.*
(SELECT a.*,row_number() over(partition by name order by date desc) as row1
FROM TABLE a)x
where x.row1=1
The idea is to assign a row number based on the date and then select the cases where row number is 1. Hope this helps.

get max of max in select function

so I have 3 tables :
parkingZone -
ZID - zone id
Name - name of the zone
maxprice - max price of the parking zone
pricePerHour
carParking -
CID - the id of the car which parking
StartTime - start time of parking
EndTime - end time of parking
ParkingZoneID - zone ID (same as ZID in parkingzone)
Cost - how much the paking costed
Cars -
CID - same as CID in carParking
ID - ID of who owns the car
cellPhone - cellPhone of who ownsthe car
now I need to find the ID and CID of who has the max "cost" of the max "maxprice",
In other words, I need to find the ZID of the maximum "maxprice"
and then to find the ID and CID of the maximum "cost" related to "ZID"
so I managed to find all the CID that relates to the ZID:
select CarParking.CID, CarParking.Cost
from CarParking
inner join (select ParkingArea.AID
from ParkingArea
inner join(
select max(ParkingArea.maxpriceperday) maxpriceperday
from ParkingArea
)maxrow on maxrow.maxpriceperday = ParkingArea.maxpriceperday)maxCid on maxCid.AID= CarParking.ParkingAreaID
but how can I get the maximum cost, and then the CID AND ID from Cars table?
important note - there can be more then one max both in "maxpriceperday" and "Cost"
which means there could be more then one ZID with maxpriceperday(if they are equal)
and more then one maximum CID to each of those ZID (if the costs are equal).
so using "TOP" or "LIMIT" will not work.
for example:
how can I accomplish that?
thanks
This would be my approach:
First, select all ZID's with maxprice using a dense_rank. Next, use a second dense_rank to get all CID and with the highest cost from the selected ZID's. Finally, use the found CID's to get the Car-data.
That gives the CID's and ID's of all cars that have the highest (equal) cost in all lots with the highest maxprice.
If the dense_rank is new to you, you can read about it here
Gathered in one query:
SELECT CID
, ID
FROM Cars AS C
INNER JOIN (
SELECT CID
, Cost
, DENSE_RANK() over (ORDER BY Cost DESC) AS orderedCosts
FROM carParking AS CP
INNER JOIN (SELECT ZID
, DENSE_RANK() over (ORDER BY maxprice DESC) AS orderedMaxprice
FROM ParkingArea
) AS PA
ON PA.ZID= CP.ParkingAreaID
AND orderedMaxprice = 1
) as cars_most_costs
ON cars_most_costs.CID = C.CID
AND cars_most_costs.orderedCosts = 1
A dense_rank works like this:
ZID | maxprice| dense_rank
1 | 1000 | 1
3 | 1000 | 1
2 | 500 | 2
4 | 400 | 3
Using your paper example:
First step gets ZID 1 and 3, which both have the highest maxprice.
Next step gets CID 1010 and 1011, which are the cars with the higest cost on parkingzoneID's 1 and/or 3.
Final step returns CID/ID combo's 1010/2000 and 1011/2001.
The result you provided is actually wrong, because CID 1014 has a cost of 10 while the other two have 20.
If you meant max cost per parkingzoneID, then the question was not very clear, but you only have to change one line:
, DENSE_RANK() over (PARTITION BY ZID ORDER BY Cost DESC) AS orderedCosts
This will also return car 1014/2004

Complex rank in SQL using Postgres

I'm in over my head with the SQL needed for a complex rank function. This is an app for a racing sport where I need to rank each Entry for a Timesheet based on the entry's :total_time.
The relevant models:
class Timesheet
has_many :entries
end
class Entry
belongs_to :timesheet
belongs_to :athlete
end
class Run
belongs_to :entry
end
An Entry's :total time isn't stored in the database. It's a calculated column of runs.sum(:finish). I use Postgres (9.3) rank() function to get Entries for a given Timesheet and rank them by this calculated column.
def ranked_entries
Entry.find_by_sql([
"SELECT *, rank() OVER (ORDER BY total_time asc)
FROM(
SELECT Entries.id, Entries.timesheet_id, Entries.athlete_id,
SUM(Runs.finish) AS total_time
FROM Entries
INNER JOIN Runs ON (Entries.id = Runs.entry_id)
GROUP BY Entries.id) AS FinalRanks
WHERE timesheet_id = ?", self.id])
end
So far so good. This returns my entry objects with a rank attribute which I can display on timesheet#show.
Now the tricky part. On a Timesheet, not every Entry will have the same number of runs. There is a cutoff (usually Top-20 but not always). This renders the rank() from Postgres inaccurate because some Entries will have a lower :total_time than the race winner because they didn't make the cutoff for the second heat.
My Question: Is it possible to do something like a rank() within a rank() to produce a table that looks like the one below? Or is there another preferred way? Thanks!
Note: I store times as integers, but I formatted them as the more familiar MM:SS in the simplified table below for clarity
| rank | entry_id | total_time |
|------|-----------|------------|
| 1 | 6 | 1:59.05 |
| 2 | 3 | 1:59.35 |
| 3 | 17 | 1:59.52 |
|......|...........|............|
| 20 | 13 | 56.56 | <- didn't make the top-20 cutoff, only has one run.
Let's create a table. (Get in the habit of including CREATE TABLE and INSERT statements in all your SQL questions.)
create table runs (
entry_id integer not null,
run_num integer not null
check (run_num between 1 and 3),
run_time interval not null
);
insert into runs values
(1, 1, '00:59.33'),
(2, 1, '00:59.93'),
(3, 1, '01:03.27'),
(1, 2, '00:59.88'),
(2, 2, '00:59.27');
This SQL statement will give you the totals in the order you want, but without ranking them.
with num_runs as (
select entry_id, count(*) as num_runs
from runs
group by entry_id
)
select r.entry_id, n.num_runs, sum(r.run_time) as total_time
from runs r
inner join num_runs n on n.entry_id = r.entry_id
group by r.entry_id, n.num_runs
order by num_runs desc, total_time asc
entry_id num_runs total_time
--
2 2 00:01:59.2
1 2 00:01:59.21
3 1 00:01:03.27
This statement adds a column for rank.
with num_runs as (
select entry_id, count(*) as num_runs
from runs
group by entry_id
)
select
rank() over (order by num_runs desc, sum(r.run_time) asc),
r.entry_id, n.num_runs, sum(r.run_time) as total_time
from runs r
inner join num_runs n on n.entry_id = r.entry_id
group by r.entry_id, n.num_runs
order by rank asc
rank entry_id num_runs total_time
--
1 2 2 00:01:59.2
2 1 2 00:01:59.21
3 3 1 00:01:03.27