Return max date from multiple tables join oracle - sql

I have 4 tables with the following relevant information I want to retrieve.
Table: Staff_profile (STAFF_ID, STAFF_USERNAME, STAFF_NAME, STAFF_JOB_ID, STAFF_FACULTY_ID, STAFF_OFF_TEL, STAFF_EMAIL) - holds staff information
Table: RFMUSERHISTORY (uh_staff_id, UH_DATETIME) - holds login history
Table: RFMUSERROLEJOBMAP (role_id, job_id ) - maps role-2-job [this is because job table pre-exists and this new app is only picking certain job ids to use against its own roles table
Table: RFMUSERROLE (USERROLE_CODE, USERROLE_ID) - holds user roles information
Now I want to get the last login (max date for that user in userhistory) details including role and staff details for any particular person who logs in. I have had trouble with my code and finally just resorted to selecting all the records for that user with the UH_datetime ordered desc so I can pick that latest topmost record.
Here is my current code (very inefficient as described above):
SELECT a.STAFF_ID, a.STAFF_USERNAME, a.STAFF_NAME, a.STAFF_JOB_ID, a.STAFF_FACULTY_ID,
a.STAFF_OFF_TEL, a.STAFF_EMAIL, to_CHAR(b.UH_DATETIME,'Dy DD-MM-YYYY HH24:MI:SS')
AS UH_DATETIME, e.USERROLE_CODE, e.USERROLE_ID
FROM STAFF_PROFILE a
LEFT JOIN RFMUSERHISTORY b ON STAFF_ID=b.uh_staff_id
LEFT JOIN RFMUSERROLEJOBMAP d ON a.STAFF_JOB_ID=d.job_id
LEFT JOIN RFMUSERROLE e ON d.role_id=e.userrole_id
WHERE STAFF_ID=:eid1 ORDER BY b.UH_DATETIME DESC

You could use an analytic function to rank the rows and then select the most recent one. If you're really just selecting the data for a single STAFF_ID, this is probably no more efficient than nesting your original query in an outer query that selects the row using a ROWNUM predicate. If you are selecting the data for multiple staff members, however, this should be more efficient.
SELECT *
FROM (
SELECT a.STAFF_ID,
a.STAFF_USERNAME,
a.STAFF_NAME,
a.STAFF_JOB_ID,
a.STAFF_FACULTY_ID,
a.STAFF_OFF_TEL,
a.STAFF_EMAIL,
to_CHAR(b.UH_DATETIME,'Dy DD-MM-YYYY HH24:MI:SS') AS UH_DATETIME,
e.USERROLE_CODE,
e.USERROLE_ID,
dense_rank() over (partition by a.staff_id order by b.uh_datetime desc) rnk
FROM STAFF_PROFILE a
LEFT JOIN RFMUSERHISTORY b ON STAFF_ID=b.uh_staff_id
LEFT JOIN RFMUSERROLEJOBMAP d ON a.STAFF_JOB_ID=d.job_id
LEFT JOIN RFMUSERROLE e ON d.role_id=e.userrole_id
WHERE STAFF_ID=:eid1
)
WHERE rnk = 1

Oracle doesn't send rows over the network before you ask for them. If your client code only request the first row, your query should be efficient enough.
Another option is to limit Oracle to one row with rownum:
where rownum = 1

Related

SQL - Min() on a Daily Query

I am trying to pull some specific information from an access control database.
I have a query providing results spanning several days. For a specific day, I need to get the first record of each person for that specific day. I have totally muddled the entire bit, hence my questions
This is the code used to pull the initial query
Select
Message.TimeStamp_SPM,
Message.FirstName,
Message.LastName,
Message.CardNumber,
Message.MessageDescription,
Message.Description,
Department.Description As Description1
From
Message Inner Join
CardHolder On CardHolder.CardHolderID = Message.CardHolderID Inner Join
Department On CardHolder.DepartmentID = Department.DepartmentID
Where
Message.TimeStamp_SPM > Convert(datetime,'2021-03-02',120) And
Message.TimeStamp_SPM < Convert(datetime,'2021-03-03',120) And
Message.Description Not Like '%Truck%'
From this query I need to display the obtain the first record of each person for that specific date. Any advice on the most efficient way to obtain the desired result?
From this query I need to display the obtain the first record of each person for that specific date.
Assuming "person" is CardHolderId, then include that in your query. You can then use window functions to get the most recent record for each CardHolderId:
with cte as (
<your query here with CardHolderId>
)
select cte.*
from (select cte.*,
row_number() over (partition by CardHolderID order by TimeStamp_SPM desc) as seqnum
from cte
) cte
where seqnum = 1;

Why do I get extra rows in LEFT JOIN when joining to an ID and TIMESTAMP column?

I have a table that contains multiple registration periods (date and time for the start of the registration, as well as date and time for when that instance of registration ends). For each row (registration period), there is a status column that contains the status at the end of the registration period. I was trying to get the status associated with the most recent end date of registration per a given ID. I've used a window function to get the most recent end date of interest per ID, and then I wanted to LEFT JOIN on ID and end date to get the status from the same table on which I used the window function. There should really just be one just one combination for a given end date and status per ID, but somehow I get more rows that what's in the left table.
Like I mentioned earlier, my approach was to use a window function to get MAX(end_date) per ID and some other column, let's call it enrollment_number. Then use LEFT JOIN on this table and its parent table to bring in status associated with that date only. Later, I'd like to use the result of this join to bring in the status associated with the end date into other tables where I need it.
WITH
my_first_test AS
(
SELECT my_id,
enrollment_number,
MAX(end_date_of_enrollment) OVER (partition by my_id, enrollment_number) AS end_date_enrolled
FROM enrollments
)
SELECT mft.my_id, mft.end_date_enrolled, e.status
FROM my_first_test AS mft
LEFT JOIN enrollments AS e
ON mft.my_id = e.my_id AND mft.end_date_enrolled = e.end_date_enrolled;
The CTE returns 42917 rows, same number of rows as in the enrollments table, which it should be if I understand it correctly.
Then, I LEFT JOIN enrollments, to bring in information from the status column also contained in the enrollments table. The LEFT JOIN is done on my_id and end_date_enrolled.
I expect 42917 rows in the resulting table, because my_id and end_date_enrolled together should be unique. However, I get slightly more rows in my final table - 44408. I was wondering if the StackOverflow community would be able to help me solve this mystery. I am using SQL in AWS Redshift.
You have duplicates in enrollments. You can find them with aggregation:
SELECT my_id, end_date_enrolled, COUNT(*)
FROM enrollments AS e
GROUP BY my_id, end_date_enrolled
HAVING COUNT(*) > 1;

Selecting rows from other tables based on the first table using SQL

I have three T-SQL statements that I'd like to combine into one, so it is just a single call to the database, not three.
SELECT * FROM Clients
The first one, selects every client from the Clients table.
SELECT * FROM History
The second one, selects all the history entries from the History table. I then use some code to find the first history for each client. i.e. first history in the table for ClientID gets set into the HasHistory column for that ClientID.
SELECT * FROM Actions
The final one, I get all the actions from the action table. I then use some code to find the last action for each client. i.e. last action in the table for ClientID gets set into the LastAction column for that ClientID.
So I'm wondering if there is a way to write an SQL statement like this for example? Note this is not real SQL, just pseudo code to illustrate what I'm trying to achieve.
SELECT *
FROM Clients
AND
SELECT First History Row
FROM History
WHERE History.ClientID = Clients.ClientID
AND
SELECT Last Action Row
FROM Actions
WHERE Actions.ClientID = Clients.ClientID
There are a number of ways you can do this, but here is one example. I'll work on it a bit at a time to explain what we are doing. You haven't shown us the table design, so the column names are a guess, but you should get the idea.
First, you have to somehow mark which history rows you care about. One way to do this is to do a query that puts an order number on every history row, that starts from 1 with every new client, and orders them by date. This way, the first history row for each client (the one you want) always has a row number of one. This would look something like
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY historyDate) AS orderNo
FROM
History
You would do something similar with actions, except you want the latest action, not the first one, so your order by column has to be in reverse order - you do this by telling the ORDER BY to use descending order, something like this
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY actionDate DESC) AS orderNo
FROM Actions
You should now have two queries where the only rows you want are marked with a order number of one. What you do now is start with your first query, and join to these other two queries so that you only join to the orderno = 1 rows. Then all the data you want will be available in one row. You have to decide which join type to use - an inner join will only return Clients that actually have a history and an action. If you want to see clients that have no rows at all in the other tables, you need to use a left outer join. But your final query (you only need this one) will look something like
SELECT
C.*, H.*, A.*
FROM
Clients C
LEFT OUTER JOIN
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY historyDate) AS orderNo
FROM History) H ON H.clientID = C.clientID AND H.orderNo = 1
LEFT OUTER JOIN
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY actionDate DESC) AS orderNo
FROM Actions) A ON A.clientID = C.clientID AND A.orderNo = 1
What this says is: take Clients (which we'll call C), then for each row, try and join to (match a row from) the History query we looked at above (which we'll call H) where the client ID is the same and the orderNo is 1 - ie the first history row. It also does the same for the Actions query.

Most efficient way to get records from a table for which a record exists in another table for each month

I have two tables as below:
User: User_ID, User_name and some other columns (has approx 1000 rows)
Fee: Created_By_User_ID, Created_Date and many other columns (has 17 million records)
Fee table does not have any index (and I can't create one).
I need a list of users for each month of a year (say 2016) who have created at least one fee record.
I do have a working query below which is taking long time to execute. Can someone help me with a better query? May be using EXIST clause (I tried one but still takes time as it scans Fee table)
SELECT MONTH(f.Created_Date), f.Created_By_User_ID
FROM Fees f
JOIN [User] u ON f.Created_By_User_ID= u.User_ID
WHERE f.Created_Date BETWEEN '2016-01-01' AND '2016-12-31'
You will require a full scan of the fee table once in the original query you are using. If you use just the join directly, as you have in the original query, you will require multiple scans of the fee table, many of which will go through redundant rows while the join occurs. Same scenario will occur when you use an inner query as suggested by Mansoor.
An optimization could be to decrease the number of rows on which the joins are happening.
Assuming that the user table contains only one record per user and the Fee table has multiple records per person, we can attempt to find distinct months users made a purchase for by using a CTE.
Then we can make a join on top of this CTE, this will reduce the computation performed by the join and should give a slightly better output time when performing over a large data set.
Try this:
WITH CTE_UserMonthwiseFeeRecords AS
(
SELECT DISTINCT Created_By_User_ID, MONTH(Created_Date) AS FeeMonth
FROM Fee
WHERE Created_Date BETWEEN '2016-01-01' AND '2016-12-31'
)
SELECT User_name, FeeMonth
FROM CTE_UserMonthwiseFeeRecords f
INNER JOIN [User] u ON f.Created_By_User_ID= u.User_ID
Also, you have not mentioned that you require the user names and all, if only id is required for the purpose of finding distinct users making purchases per month, then you can just use the query within the CTE and not even require the JOIN as:
SELECT DISTINCT Created_By_User_ID, MONTH(Created_Date) AS FeeMonth
FROM Fee
WHERE Created_Date BETWEEN '2016-01-01' AND '2016-12-31'
Try below query :
SELECT MONTH(f.Created_Date), f.Created_By_User_ID
FROM Fees f
WHERE EXISTS(SELECT 1 FROM [User] u WHERE f.Created_By_User_ID= u.User_ID
AND DATEDIFF(DAY,f.Created_Date,'2016-01-01') <= 0 AND
DATEDIFF(DAY,f.Created_Date,'2016-12-31') >= 0
You may try this approach to reduce the query run time. however, It does duplicate the huge data and store a instance of table (Temp_Fees), On every DML performed on table Fees/User require truncate and fresh load of table Temp_Fees.
Select * into Temp_Fees from (SELECT MONTH(f.Created_Date) as Created_MONTH, f.Created_By_User_ID
FROM Fees f
WHERE f.Created_Date BETWEEN '2016-01-01' AND '2016-12-31' )
SELECT f.Created_MONTH, f.Created_By_User_ID
FROM Temp_Fees f
JOIN [User] u ON f.Created_By_User_ID= u.User_ID

Suppress Nonadjacent Duplicates in Report

Medical records in my Crystal Report are sorted in this order:
...
Group 1: Score [Level of Risk]
Group 2: Patient Name
...
Because patients are sorted by Score before Name, the report pulls in multiple entries per patient with varying scores - and since duplicate entries are not always adjacent, I can't use Previous or Next to suppress them. To fix this, I'd like to only display the latest entry for each patient based on the Assessment Date field - while maintaining the above order.
I'm convinced this behavior can be implemented with a custom SQL command to only pull in the latest entry per patient, but have had no success creating that behavior myself. How can I accomplish this compound sort?
Current SQL Statement in use:
SELECT "EpisodeSummary"."PatientID",
"EpisodeSummary"."Patient_Name",
"EpisodeSummary"."Program_Value"
"RiskRating"."Rating_Period",
"RiskRating"."Assessment_Date",
"RiskRating"."Episode_Number",
"RiskRating"."PatientID",
"Facility"."Provider_Name",
FROM (
"SYSTEM"."EpisodeSummary"
"EpisodeSummary"
LEFT OUTER JOIN "FOOBARSYSTEM"."RiskAssessment" "RiskRating"
ON (
("EpisodeSummary"."Episode_Number"="RiskRating"."Episode_Number")
AND
("EpisodeSummary"."FacilityID"="RiskRating"."FacilityID")
)
AND
("EpisodeSummary"."PatientID"="RiskRating"."PatientID")
), "SYSTEM"."Facility" "Facility"
WHERE (
"EpisodeSummary"."FacilityID"="Facility"."FacilityID"
)
AND "RiskRating"."PatientID" IS NOT NULL
ORDER BY "EpisodeSummary"."Program_Value"
The SQL code below may not be exactly correct, depending on the structure of your tables. The code below assumes the 'duplicate risk scores' were coming from the RiskAssessment table. If this is not correct, the code may need to be altered.
Essentially, we create a derived table and create a row_number for each record, based on the patientID and ordered by the assessment date - The most recent date will have the lowest number (1). Then, on the join, we restrict the resultset to only select record #1 (each patient has its own rank #1).
If this doesn't work, let me know and provide some table details -- Should the Facility table be the starting point? are there multiple entries in EpisodeSummary per patient? thanks!
SELECT es.PatientID
,es.Patient_Name
,es.Program_Value
,rrd.Rating_Period
,rrd.Assessment_Date
,rrd.Episode_Number
,rrd.PatientID
,f.Provider_Name
FROM SYSTEM.EpisodeSummary es
LEFT JOIN (
--Derived Table retreiving highest risk score for each patient)
SELECT PatientID
,Assessment_Date
,Episode_Number
,FacilityID
,Rating_Period
,ROW_NUMBER() OVER (
PARTITION BY PatientID ORDER BY Assessment_Date DESC
) AS RN -- This code generates a row number for each record. The count is restarted for every patientID and the count starts at the most recent date.
FROM RiskAssessment
) rrd
ON es.patientID = rrd.patientid
AND es.episode_number = rrd.episode_number
AND es.facilityid = rrd.facilityid
AND rrd.RN = 1 --This only retrieves one record per patient (the most recent date) from the riskassessment table
INNER JOIN SYSTEM.Facility f
ON es.facilityid = f.facilityid
WHERE rrd.PatientID IS NOT NULL
ORDER BY es.Program_Value