SQL: select max values from two related tables - sql

I have tables: Waiter and WaiterDetail and Cafe. Cafe can have many Waiters and Waiter may have many WaiterDetails. I would like to find oldest WaiterDetail by field CreatedDate, of the first hired Waiter by his HiredDate.
Cafe:
*CafeId(primary)
Waiter:
*WaiterId(primary)
*CafeId
*HiredDate
WaiterDetail:
*WaiterDetailID(primary)
*WaiterId
*CreatedDate
How will look like query for Oracle and MS SQL Server?

If I have it correct, you want: for each Cafe to find Waiter with the first (earliest) HiredDate and, for that waiter, find detail with the oldest (earliest) CreatedDate. So, for Oracle (and maybe SQL Server), something like this:
SELECT *
FROM (
SELECT w.*,
d.*,
ROW_NUMBER() OVER ( PARTITION BY CafeID
ORDER BY w.HiredDate ASC,
d.CreatedDate ASC ) AS rn
FROM Waiter w
INNER JOIN WaiterDetail d
ON ( w.WaiterId = d.WaiterId )
)
WHERE rn = 1;

Related

select rows in sql with latest date from 3 tables in each group

I'm creating PREDICATE system for my application.
Please see image that I already
I have a question how can I select rows in SQL with latest date "Taken On" column tables for each "QuizESId" columns, before that I am understand how to select it but it only using one table, I learn from this
select rows in sql with latest date for each ID repeated multiple times
Here is what I have already tried
SELECT tt.*
FROM myTable tt
INNER JOIN
(SELECT ID, MAX(Date) AS MaxDateTime
FROM myTable
GROUP BY ID) groupedtt ON tt.ID = groupedtt.ID
AND tt.Date = groupedtt.MaxDateTime
What I am confused about here is how can I select from 3 tables, I hope you can guide me, of course I need a solution with good query and efficient performance.
Thanks
This is for SQL Server (you didn't specify exactly what RDBMS you're using):
if you want to get the "latest row for each QuizId" - this sounds like you need a CTE (Common Table Expression) with a ROW_NUMBER() value - something like this (updated: you obviously want to "partition" not just by QuizId, but also by UserName):
WITH BaseData AS
(
SELECT
mAttempt.Id AS Id,
mAttempt.QuizModelId AS QuizId,
mAttempt.StartedAt AS StartsOn,
mUser.UserName,
mDetail.Score AS Score,
RowNum = ROW_NUMBER() OVER (PARTITION BY mAttempt.QuizModelId, mUser.UserName
ORDER BY mAttempt.TakenOn DESC)
FROM
UserQuizAttemptModels mAttempt
INNER JOIN
AspNetUsers mUser ON mAttempt.UserId = muser.Id
INNER JOIN
QuizAttemptDetailModels mDetail ON mDetail.UserQuizAttemptModelId = mAttempt.Id
)
SELECT *
FROM BaseData
WHERE QuizId = 10053
AND RowNum = 1
The BaseData CTE basically selects the data (as you did) - but it also adds a ROW_NUMBER() column. This will "partition" your data into groups of data - based on the QuizModelId - and it will number all the rows inside each data group, starting at 1, and ordered by the second condition - the ORDER BY clause. You said you want to order by "Taken On" date - but there's no such date visible in your query - so I just guessed it might be on the UserQuizAttemptModels table - change and adapt as needed.
Now you can select from that CTE with your original WHERE condition - and you specify, that you want only the first row for each data group (for each "QuizId") - the one with the most recent "Taken On" date value.

What is the most efficient way to find the first and last entry of an entity in SQL?

I was asked this question in an interview. A table, trips, contains the following columns( customer_id, start_from, end_at, start_at_time, end_at_time), with data structured so that each trip is stored as a separate row and a part of the table looks like this: How would you find the list of all the customers who started yesterday from point A and ended yesterday at point P?
I provided solution using windowing functions that identified the list of all customers that started their day at A and then did an inner join of a list of these customers with the customers who ended their day at P( using the same windowing functions).
The solution I gave was this:
SELECT a.customer_id
FROM
(SELECT a.customer_id
FROM
(SELECT customer_id,
start_from,
row_number() OVER (PARTITION BY customer_id
ORDER BY start_at_time ASC) AS rnk
FROM trips
WHERE to_date(start_at_time)= date_sub(CURRENT_DATE, 1) ) AS a
WHERE a.rnk=1
AND a.start_from='A' ) AS a
INNER JOIN
(SELECT a.customer_id
FROM
(SELECT customer_id,
end_at,
row_number() OVER (PARTITION BY customer_id
ORDER BY end_at_time DESC) AS rnk
FROM trips
WHERE to_date(end_at_time)= date_sub(CURRENT_DATE, 1) ) AS a
WHERE a.rnk=1
AND a.end_at='P' ) AS b ON a.customer_id=b.customer_id
My interviewer said my solution was correct but there is a more efficient way to solve this problem. I've searching and trying to find a more efficient way but I could not find one so far. Can you suggest a more efficient way to solve this problem?
I might use first_value() for this:
select t.customer_id
from (select t.*,
first_value(start_from) over (partition by customer_id order by start_at_time) as first_start,
first_value(end_at) over (partition by customer_id order by start_at_time desc) as last_end
from t
where start_at_time >= date_sub(CURRENT_DATE, 1) and
start_at_time < CURRENT_DATE
) t
where first_start = start_from and -- just some filtering so select distinct is not needed
first_start = 'A' and
last_end = 'P';
I should add that many databases support an equivalent function for aggregation, and I would use that instead.
This assumes that starts are not repeated. To be safe, you can add select distinct, but there is a performance hit for that.
A generalized version of what I would probably have done:
SELECT fandl.a
FROM (
SELECT a, MIN(start) AS t0, MAX(start) AS tN
FROM someTable
WHERE start >= DATE_SUB(CURRENT_DATE, 1) AND start < CURRENT_DATE
GROUP BY a
) AS fandl
INNER JOIN someTable AS st0 ON fandl.a = st0.a AND fandl.t0 = st0.start
INNER JOIN someTable AS stN ON fandl.a = stN.a AND fandl.tN = stN.start
WHERE st0.b1 = 'A' AND stN.b2 = 'P'
;
Using the date function you did, since you did not specify sql dialect.
Note that, in many RDBMS, if there is an (a, start) index, the subquery and joins can be done with the index alone; actual table access would only be required for the final WHERE evaluation.

SQL plus, top 3 rank across two tables

I'm trying to find a way to query the top three users in a database in terms of number of listens and output their user ID and their rank.
The schema for the two tables in question is as follows :
User(user_id, email, first_name, last_name, password, created_on, last_sign_in)
PreviouslyPlayed(user_id, track_id, timestamp)
I could see how many people pull this off with a count query, but am wondering is there's a way to do this with a rank or dense rank
If you just want the user id and are using Oracle 12g+, then you can do:
select pp.user_id, rank() over (order by count(*) desc) as therank
from previouslyplayed pp
group by pp.user_id
order by count(*) desc
fetch first 3 rows only;
In earlier versions, you would use a subquery:
select pp.*
from (select pp.user_id, rank() over (order by count(*) desc) as therank
from previouslyplayed pp
group by pp.user_id
) pp
where therank <= 3;
You might want to review row_number(), rank(), and dense_rank() to be sure you are getting what you really want (the difference is in how they handle ties).
You only need the join if you are concerned that something called user_id in one table is not a valid user id. That seems unlikely, in any well-designed database.

Over clause in SQL Server

I have the following query
select * from
(
SELECT distinct
rx.patid
,rx.fillDate
,rx.scriptEndDate
,MAX(datediff(day, rx.filldate, rx.scriptenddate)) AS longestScript
,rx.drugClass
,COUNT(rx.drugName) over(partition by rx.patid,rx.fillDate,rx.drugclass) as distinctFamilies
FROM [I 3 SCI control].dbo.rx
where rx.drugClass in ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
GROUP BY rx.patid, rx.fillDate, rx.scriptEndDate,rx.drugName,rx.drugClass
) r
order by distinctFamilies desc
which produces results that look like
This should mean that between the two dates in the table the patID that there should be 5 unique drug names. However, when I run the following query:
select distinct *
from rx
where patid = 1358801781 and fillDate between '2008-10-17' and '2008-11-16' and drugClass='H4B'
I have a result set returned that looks like
You can see that while there are in fact five rows returned for the second query between the dates of 2008-10-17 and 2009-01-15, there are only three unique names. I've tried various ways of modifying the over clause, all with different levels of non-success. How can I alter my query so that I only find unique drugNames within the timeframe specified for each row?
Taking a shot at it:
SELECT DISTINCT
patid,
fillDate,
scriptEndDate,
MAX(DATEDIFF(day, fillDate, scriptEndDate)) AS longestScript,
drugClass,
MAX(rn) OVER(PARTITION BY patid, fillDate, drugClass) as distinctFamilies
FROM (
SELECT patid, fillDate, scriptEndDate, drugClass,rx.drugName,
DENSE_RANK() OVER(PARTITION BY patid, fillDate, drugClass ORDER BY drugName) as rn
FROM [I 3 SCI control].dbo.rx
WHERE drugClass IN ('h3a','h6h','h4b','h2f','h2s','j7c','h2e')
)x
GROUP BY x.patid, x.fillDate, x.scriptEndDate,x.drugName,x.drugClass,x.rn
ORDER BY distinctFamilies DESC
Not sure if DISTINCT is really necessary - left it in since you've used it.

T-SQL: Update first row of recordset

I have a query (A) that can returns multiple rows in date order:
SELECT encounter_id, department_id, effective_time
FROM adt
WHERE event_type IN (1,3,7)
ORDER BY effective_time
I have another query (B) that returns a single row:
SELECT encounter_id, department_id, arrival_time
FROM ed
WHERE event_type = 50
I would like to join the query B to query A, in such a way that query B's single row will be associated with query A's first record.
I realize that I could do this with a CURSOR, but I was hoping to use T-SQL row_number() function.
Not sure if i got the question right.
Let me know if the below solution is different than what you were expecting
SELECT *
FROM
(
SELECT TOP 1
encounter_id, department_id, effective_time
FROM adt
WHERE event_type IN (1,3,7)
ORDER BY effective_time
)adt1,
(
SELECT encounter_id, department_id, arrival_time
FROM ed
WHERE event_type = 50
) ed1
then you can join both the tables as per your need, using WHERE clause
Regards,
Niyaz
I found my answer:
row_number() OVER (PARTITION BY encounter_id ORDER BY encounter_id, effective_time) row.
Unfortunately, the database has data-quality issues that prevent me from approaching the solution this way.
Thanks for your assistance.