Can I avoid joining the same table multiple times? - sql

Is there a way to improve the following query?
I would need an optimized version of the following query.
The reason I'm joining the Date_Table multiple times is because the ID and date_value columns are not in ascending order.
ie
ID = 1, date_value = '2022-09-07'; ID = 2, date_value = '2022-02-02'; ID = 3, date_value = '2022-11-12';
Sample data:
The maximum Date from the Agreements table is calculated based on the Date_Table.date_value column. The query will only return a row. In this case, the row highlighted in green will be the result.
Thank you so much!
SELECT * FROM Agreement
WHERE
dim_date_id = (
SELECT
Date_Table.ID
FROM (
SELECT
MAX(Date_Table.date_value) AS date_value
FROM Agreement
INNER JOIN Date_Table
ON Agreement.DIM_DATE_ID = Date_Table.ID
) AS last_day
INNER JOIN Date_Table
ON last_day.date_value = Date_Table.date_value
);

If Agreement is a large table, you should first find all the distinct date_ids, then join it to Date_Table. Also use a rank() windowing function to find the id of the most recent record:
Select Agreement.* From Agreement Inner Join (
Select ID From (
Select Date_Table.ID
,rank() Over (Order by Date_Table.date_value desc) as recent
From Date_Table Inner Join (
Select Distinct Dim_Date_ID as ID From Agreement
) A On A.ID=Date_Table.ID
) where recent=1
) X On Agreement.DIM_DATE_ID = X.ID
On first glance this looks just as complicated as your original query. But it quickly reduces the Agreement results to only a list of date ids, and especially if that field is indexed it is a fast query. Date_Table is then Inner Joined to find the best (most recent) Date_Value using a rank() function. The whole thing is filtered to retain only one record, the most recent, and that date_id is used to filter Agreement.
Again, I recommend that you index Agreement.Dim_Date_ID to make this query perform well.

Related

SQL get latest availability per member

I have a situation where I store in a table each member's availability.
It's a simple table with 4 column.
CREATE TABLE availablities (
availablity_id serial PRIMARY KEY,
member_id serial,
availablity_status_id serial,
start_date timestamp
);
Each member can have multiple records in the table and to get the current status
I get for each member the record that has the most recent start_date that is smaller then now().
I first tried with a naive Max() and Group by query
select
status_code, max(start_date) start_date,availablities.member_id
from
availablities
join
availablity_status on availablity_status.availablity_status_id = availablities.availablity_status_id
where
start_date <= now()
group by
status_code,availablities.member_id;
But this return multiple records per user as I get the most recent record by user and by status.
I finally came up with a query that gives me the expected result.
select status_code,start_date,a2.member_id from availablities a2
join availablity_status on availablity_status.availablity_status_id = a2.availablity_status_id
where a2.availablity_id in(
select
max(availablity_id)
from availablities a
where
a.member_id = a2.member_id and
start_date in(
select
max(start_date) start_date
from availablities
where
start_date <= now()
and a.member_id = availablities.member_id
)
);
But this query takes 60 times longer to execute and doesn't feel right.
I'm pretty sure there must be a better solution but I can't get my hands on it.
What is the correct way to get the expected result?
I've created a DB-fiddle to make it easier to see. Query 1 is incorrect and Query 2 is much slower when we have a couple more data.
https://www.db-fiddle.com/f/iWgvuj8kcms9F5CKuoKsny/2
It looks like you need to use a simple row_number window function here:
with a as (
select *, Row_Number() over(partition by member_id order by start_date desc, availablity_id desc) rn
from availablities
where start_date<now()
)
select s.status_code, a.start_date, a.member_id
from a join availablity_status s on s.availablity_status_id=a.availablity_status_id
where rn=1
Note your data is not selective enough, so for member_id 3, is it available or not? What is the most recent date when there are two identical dates?
I added a tie-breaker to also sort by availability_id to get your expected results
Actually it's availablity_id - you seem to have a common typo here!
See your updated Fiddle

Finding Max(Date) BEFORE specified date in Redshift SQL

I have a table (Table A) in SQL (AWS Redshift) where I've isolated my beginning population that contains account id's and dates. I'd like to take the output from that table and LEFT join back to the "accounts" table to ONLY return the start date that precedes or comes directly before the date stored in the table from my output.
Table A (Beg Pop)
-------
select account_id,
min(start_date),
min(end_date)
from accounts
group by 1;
I want to return ONLY the date that precedes the date in my current table where account_id match. I'm looking for something like...
Table B
-------
select a.account_id,
a.start_date,
a.end_date,
b.start_date_prev,
b.end_date_prev
from accounts as a
left join accounts as b on a.account_id = b.account_id
where max(b.start_date) less than a.start_date;
Ultimately, I want to return everything from table a and only the dates where max(start_date) is less than the start_date from table A. I know aggregation is not allowed in the WHERE clause and I guess I can do a subquery but I only want the Max date BEFORE the dates in my output. Any suggestions are greatly appreciated.
I want to return ONLY the date that precedes the date in my current table where account_id match
If you want the previous date for a given row, use lag():
select a.*,
lag(start_date) over (partition by account_id order by start_date) as prev_start_date
from accounts a;
As I understand from the requirement is to display all rows from a base table with the preceeding data sorted based on a column and with some conditions
Please check following example which I took from article Select Next and Previous Rows with Current Row using SQL CTE Expression
WITH CTE as (
SELECT
ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY start_date) as RN,
*
FROM accounts
)
SELECT
PreviousRow.*,
CurrentRow.*,
NextRow.*
FROM CTE as CurrentRow
LEFT JOIN CTE as PreviousRow ON
PreviousRow.RN = CurrentRow.RN - 1 and PreviousRow.account_id = CurrentRow.account_id
LEFT JOIN CTE as NextRow ON
NextRow.RN = CurrentRow.RN + 1 and NextRow.account_id = CurrentRow.account_id
ORDER BY CurrentRow.account_id, CurrentRow.start_date;
I tested with following sample data and it seems to be working
create table accounts(account_id int, start_date date, end_date date);
insert into accounts values (1,'20201001','20201003');
insert into accounts values (1,'20201002','20201005');
insert into accounts values (1,'20201007','20201008');
insert into accounts values (1,'20201011','20201013');
insert into accounts values (2,'20201001','20201002');
insert into accounts values (2,'20201015','20201016');
Output is as follows

select rows in sql with latest date from 3 tables in each group

I'm creating PREDICATE system for my application.
Please see image that I already
I have a question how can I select rows in SQL with latest date "Taken On" column tables for each "QuizESId" columns, before that I am understand how to select it but it only using one table, I learn from this
select rows in sql with latest date for each ID repeated multiple times
Here is what I have already tried
SELECT tt.*
FROM myTable tt
INNER JOIN
(SELECT ID, MAX(Date) AS MaxDateTime
FROM myTable
GROUP BY ID) groupedtt ON tt.ID = groupedtt.ID
AND tt.Date = groupedtt.MaxDateTime
What I am confused about here is how can I select from 3 tables, I hope you can guide me, of course I need a solution with good query and efficient performance.
Thanks
This is for SQL Server (you didn't specify exactly what RDBMS you're using):
if you want to get the "latest row for each QuizId" - this sounds like you need a CTE (Common Table Expression) with a ROW_NUMBER() value - something like this (updated: you obviously want to "partition" not just by QuizId, but also by UserName):
WITH BaseData AS
(
SELECT
mAttempt.Id AS Id,
mAttempt.QuizModelId AS QuizId,
mAttempt.StartedAt AS StartsOn,
mUser.UserName,
mDetail.Score AS Score,
RowNum = ROW_NUMBER() OVER (PARTITION BY mAttempt.QuizModelId, mUser.UserName
ORDER BY mAttempt.TakenOn DESC)
FROM
UserQuizAttemptModels mAttempt
INNER JOIN
AspNetUsers mUser ON mAttempt.UserId = muser.Id
INNER JOIN
QuizAttemptDetailModels mDetail ON mDetail.UserQuizAttemptModelId = mAttempt.Id
)
SELECT *
FROM BaseData
WHERE QuizId = 10053
AND RowNum = 1
The BaseData CTE basically selects the data (as you did) - but it also adds a ROW_NUMBER() column. This will "partition" your data into groups of data - based on the QuizModelId - and it will number all the rows inside each data group, starting at 1, and ordered by the second condition - the ORDER BY clause. You said you want to order by "Taken On" date - but there's no such date visible in your query - so I just guessed it might be on the UserQuizAttemptModels table - change and adapt as needed.
Now you can select from that CTE with your original WHERE condition - and you specify, that you want only the first row for each data group (for each "QuizId") - the one with the most recent "Taken On" date value.

SQL - Group By unique column combination

I am trying to write a script that will return the latest values for a unique documentid-physician-patient triplet. I need the script to act similar to a group by statement, except group by only works with one column at a time. I need to date and status information for only the most recent unique triplet. Please let me know what you will need to see from me to help. Here is the current, very bare, statement:
SELECT
TransmissionSend.CreateTimestamp,
TransmissionSendItem.Status,
TransmissionSendItem.PhysicianId,
TransmissionSendItem.DocumentIdDisplay,
Utility.SqlFunctions_NdnListToAccountList(TransmissionSendItem.NdocNum) AS AccountNum
FROM
Interface_SFAX.TransmissionSend,
Interface_SFAX.TransmissionSendItem
WHERE
TransmissionSend.ID = TransmissionSendItem.childsub --I don't know exactly what this does, I did not write this script. It must stay here though for the exact results.
ORDER BY TransmissionSend.CreateTimestamp DESC -- In the end, each latest result of the unique triplet will be ordered from most recent to oldest in return
My question is, again, how can I limit results to only the latest status for each physician id, document id, and account number combination?
First select the MAX(date) with the documentid GROUP BY documentid then select all data from the table by the first select result for example with an inner join.
SELECT table.additionalData, J.id, J.date
FROM table
INNER JOIN (SELECT id, MAX(date) AS date
FROM table GROUP BY id) AS J
ON J.id = table.id
AND J.date /* this is the max date */ = table.date

One row of data for a max date only - transact SQL

I am trying to select the max dates on a field with other tables, to only give me one distinct row for the max date and not other rows with other dates. the code i have for max is
SELECT DISTINCT
Cust.CustId,
LastDate=(Select Max(Convert(Date,TreatmentFieldHstry.TreatmentDateTime))
FROM TreatmentFieldHstry
WHERE Cust.CustSer = Course.CustSer
AND Course.CourseSer = Session.CourseSer
AND Session.SessionSer = TreatmentFieldHstry.SessionSer)
This gives multiple rows depending on how many dates - i just want one for the max - can anyone help with this?
Thanks
You didn't specify exactly what database and version you're using - but if you're on SQL Server 2005 or newer, you can use something like this (a CTE with the ROW_NUMBER ranking function) - I've simplified it a bit, since I don't know what those other tables are that you have in your select, that don't ever show up in any of the SELECT column lists.....
;WITH TopData AS
(
SELECT c.CustId, t.TreatmentDateTime,
ROW_NUMBER() OVER(PARTITION BY c.CustId ORDER BY t.TreatmentDateTime DESC) AS 'RowNum'
FROM
dbo.TreatmentFieldHstry t
INNER JOIN
dbo.Customer c ON c.CustId = t.CustId -- or whatever JOIN condition you have
WHERE
c.CustSer = Course.CustSer
)
SELECT
*
FROM
TopData
WHERE
RowNum = 1
Basically, the CTE (Common Table Expression) partitions your data by CustId and order by TreatmentDateTime (descending - newest first) - and numbers every entry with a consecutive number - for each "partition" (e.g. for each new value of CustId). With this, the newest entry for each customer has RowNum = 1 which is what I use to select it from that CTE.