Selecting rows from other tables based on the first table using SQL - sql

I have three T-SQL statements that I'd like to combine into one, so it is just a single call to the database, not three.
SELECT * FROM Clients
The first one, selects every client from the Clients table.
SELECT * FROM History
The second one, selects all the history entries from the History table. I then use some code to find the first history for each client. i.e. first history in the table for ClientID gets set into the HasHistory column for that ClientID.
SELECT * FROM Actions
The final one, I get all the actions from the action table. I then use some code to find the last action for each client. i.e. last action in the table for ClientID gets set into the LastAction column for that ClientID.
So I'm wondering if there is a way to write an SQL statement like this for example? Note this is not real SQL, just pseudo code to illustrate what I'm trying to achieve.
SELECT *
FROM Clients
AND
SELECT First History Row
FROM History
WHERE History.ClientID = Clients.ClientID
AND
SELECT Last Action Row
FROM Actions
WHERE Actions.ClientID = Clients.ClientID

There are a number of ways you can do this, but here is one example. I'll work on it a bit at a time to explain what we are doing. You haven't shown us the table design, so the column names are a guess, but you should get the idea.
First, you have to somehow mark which history rows you care about. One way to do this is to do a query that puts an order number on every history row, that starts from 1 with every new client, and orders them by date. This way, the first history row for each client (the one you want) always has a row number of one. This would look something like
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY historyDate) AS orderNo
FROM
History
You would do something similar with actions, except you want the latest action, not the first one, so your order by column has to be in reverse order - you do this by telling the ORDER BY to use descending order, something like this
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY actionDate DESC) AS orderNo
FROM Actions
You should now have two queries where the only rows you want are marked with a order number of one. What you do now is start with your first query, and join to these other two queries so that you only join to the orderno = 1 rows. Then all the data you want will be available in one row. You have to decide which join type to use - an inner join will only return Clients that actually have a history and an action. If you want to see clients that have no rows at all in the other tables, you need to use a left outer join. But your final query (you only need this one) will look something like
SELECT
C.*, H.*, A.*
FROM
Clients C
LEFT OUTER JOIN
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY historyDate) AS orderNo
FROM History) H ON H.clientID = C.clientID AND H.orderNo = 1
LEFT OUTER JOIN
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY clientID ORDER BY actionDate DESC) AS orderNo
FROM Actions) A ON A.clientID = C.clientID AND A.orderNo = 1
What this says is: take Clients (which we'll call C), then for each row, try and join to (match a row from) the History query we looked at above (which we'll call H) where the client ID is the same and the orderNo is 1 - ie the first history row. It also does the same for the Actions query.

Related

SQL - Min() on a Daily Query

I am trying to pull some specific information from an access control database.
I have a query providing results spanning several days. For a specific day, I need to get the first record of each person for that specific day. I have totally muddled the entire bit, hence my questions
This is the code used to pull the initial query
Select
Message.TimeStamp_SPM,
Message.FirstName,
Message.LastName,
Message.CardNumber,
Message.MessageDescription,
Message.Description,
Department.Description As Description1
From
Message Inner Join
CardHolder On CardHolder.CardHolderID = Message.CardHolderID Inner Join
Department On CardHolder.DepartmentID = Department.DepartmentID
Where
Message.TimeStamp_SPM > Convert(datetime,'2021-03-02',120) And
Message.TimeStamp_SPM < Convert(datetime,'2021-03-03',120) And
Message.Description Not Like '%Truck%'
From this query I need to display the obtain the first record of each person for that specific date. Any advice on the most efficient way to obtain the desired result?
From this query I need to display the obtain the first record of each person for that specific date.
Assuming "person" is CardHolderId, then include that in your query. You can then use window functions to get the most recent record for each CardHolderId:
with cte as (
<your query here with CardHolderId>
)
select cte.*
from (select cte.*,
row_number() over (partition by CardHolderID order by TimeStamp_SPM desc) as seqnum
from cte
) cte
where seqnum = 1;

How to work past "At most one record can be returned by this subquery"

I'm having trouble understanding this error through all the researching I have done. I have the following query
SELECT M.[PO Concatenate], Sum(M.SumofAward) AS TotalAward, (SELECT TOP 1 M1.[Material Group] FROM
[MGETCpreMG] AS M1 WHERE M1.[PO Concatenate]=M.[PO Concatenate] ORDER BY M1.SumofAward DESC) AS TopGroup
FROM MGETCpreMG AS M
GROUP BY M.[PO Concatenate];
For a brief instance it reviews the results I want, but then the "At most one record can be returned by this subquery" error comes and wipes all the data to #Name?
For context, [MGETCpreMG] is a query off a main table [MG ETC] that was used to consolidate Award for differing Material Groups on a PO transaction ([PO Concatenate])
SELECT [MG ETC].[PO Concatenate], Sum([MG ETC].Award) AS SumOfAward, [MG ETC].[Material Group]
FROM [MG ETC]
GROUP BY [MG ETC].[PO Concatenate], [MG ETC].[Material Group]
ORDER BY [MG ETC].[PO Concatenate];
I'm thinking it lies in my inability to understand how to utilize a subquery.
In the case in which the query can return more then one value? Simply add an additonal sort by.
So, a common sub query might be to get the last invoice. So you might have:
select ID, CompanyName,
(SELECT TOP 1 InvoiceDate from tblInvoice
where tblInvoice.CustomerID = tblCompany.ID
Order by InvoiceDate DESC)
As LastInvoiceDate
From tblCustomers
Now the above might work for some time, but then it will blow up since you might have two invoices for the same day!
So, all you have to do is add that extra order by clause - say on the PK of the child table like this:
Order by InvoiceDate DESC,ID DESC)
So top 1 will respect the "additional" order columns you add, and thus only ever return one row - even if there are multiple values that match the top 1 column.
I suppose in the above we could perhaps forget the invoiceDate and always take the top most last autonumber ID, but for a lot of queries, you can't always be sure - it might be we want the last most expensive invoice amount. And again, if the max value (top) was the same for two large invoice amounts, then again two rows could be return. So, simply add the extra ORDER BY clause with an 2nd column that further orders the data. And thus top 1 will only pull the first value. Your example of a top group is such an example. Just tack on the extra order by "ID" or whatever the auto number ID column is.

SQL JOIN to select MAX value among multiple user attempts returns two values when both attempts have the same value

Good morning, everyone!
I have a pretty simple SELECT/JOIN statement that gets some imported data from a placement test and returns the highest scored attempt a user made, the best score. Users can take this test multiple times, so we just use the best attempt. What if a user makes multiple attempts (say, takes it twice,) and receives the SAME score both times?
My current query ends up returning BOTH of those records, as they're both equal, so MAX() returns both. There are no primary keys setup on this yet--the query I'm using below is the one I hope to add into an INSERT statement for another table, once I only get a SINGLE best attempt per User (StudentID), and set that StudentID as the key. So you see my problem...
I've tried a few DISTINCT or TOP statements in my query but either I'm putting them into the wrong part of the query or they still return two records for a user who had identically scored attempts. Any suggestions?
SELECT p.*
FROM
(SELECT
StudentID, MAX(PlacementResults) AS PlacementResults
FROM AleksMathResults
GROUP BY StudentID)
AS mx
JOIN AleksMathResults p ON mx.StudentID = p.StudentID AND mx.PlacementResults = p.PlacementResults
ORDER BY
StudentID
Sounds like you want row_number():
SELECT amr.*
FROM (SELECT amr.*
ROW_NUMBER() OVER (PARTITION BY StudentID ORDER BY PlacementResults DESC) as seqnum
FROM AleksMathResults amr
) amr
WHERE seqnum = 1;

Suppress Nonadjacent Duplicates in Report

Medical records in my Crystal Report are sorted in this order:
...
Group 1: Score [Level of Risk]
Group 2: Patient Name
...
Because patients are sorted by Score before Name, the report pulls in multiple entries per patient with varying scores - and since duplicate entries are not always adjacent, I can't use Previous or Next to suppress them. To fix this, I'd like to only display the latest entry for each patient based on the Assessment Date field - while maintaining the above order.
I'm convinced this behavior can be implemented with a custom SQL command to only pull in the latest entry per patient, but have had no success creating that behavior myself. How can I accomplish this compound sort?
Current SQL Statement in use:
SELECT "EpisodeSummary"."PatientID",
"EpisodeSummary"."Patient_Name",
"EpisodeSummary"."Program_Value"
"RiskRating"."Rating_Period",
"RiskRating"."Assessment_Date",
"RiskRating"."Episode_Number",
"RiskRating"."PatientID",
"Facility"."Provider_Name",
FROM (
"SYSTEM"."EpisodeSummary"
"EpisodeSummary"
LEFT OUTER JOIN "FOOBARSYSTEM"."RiskAssessment" "RiskRating"
ON (
("EpisodeSummary"."Episode_Number"="RiskRating"."Episode_Number")
AND
("EpisodeSummary"."FacilityID"="RiskRating"."FacilityID")
)
AND
("EpisodeSummary"."PatientID"="RiskRating"."PatientID")
), "SYSTEM"."Facility" "Facility"
WHERE (
"EpisodeSummary"."FacilityID"="Facility"."FacilityID"
)
AND "RiskRating"."PatientID" IS NOT NULL
ORDER BY "EpisodeSummary"."Program_Value"
The SQL code below may not be exactly correct, depending on the structure of your tables. The code below assumes the 'duplicate risk scores' were coming from the RiskAssessment table. If this is not correct, the code may need to be altered.
Essentially, we create a derived table and create a row_number for each record, based on the patientID and ordered by the assessment date - The most recent date will have the lowest number (1). Then, on the join, we restrict the resultset to only select record #1 (each patient has its own rank #1).
If this doesn't work, let me know and provide some table details -- Should the Facility table be the starting point? are there multiple entries in EpisodeSummary per patient? thanks!
SELECT es.PatientID
,es.Patient_Name
,es.Program_Value
,rrd.Rating_Period
,rrd.Assessment_Date
,rrd.Episode_Number
,rrd.PatientID
,f.Provider_Name
FROM SYSTEM.EpisodeSummary es
LEFT JOIN (
--Derived Table retreiving highest risk score for each patient)
SELECT PatientID
,Assessment_Date
,Episode_Number
,FacilityID
,Rating_Period
,ROW_NUMBER() OVER (
PARTITION BY PatientID ORDER BY Assessment_Date DESC
) AS RN -- This code generates a row number for each record. The count is restarted for every patientID and the count starts at the most recent date.
FROM RiskAssessment
) rrd
ON es.patientID = rrd.patientid
AND es.episode_number = rrd.episode_number
AND es.facilityid = rrd.facilityid
AND rrd.RN = 1 --This only retrieves one record per patient (the most recent date) from the riskassessment table
INNER JOIN SYSTEM.Facility f
ON es.facilityid = f.facilityid
WHERE rrd.PatientID IS NOT NULL
ORDER BY es.Program_Value

Using Rank vs Max (T-SQL)

Im currently working on troubleshooting an old job which is taking long in running the query. The old job uses the first query but I have been testing using the second query.
Differences between:
Select Max(Cl1) as Tab,
Max(Cl2) as Tb,
Customer
From TableA
group by Customer
vs
Select Customer,
Tab,
tb
From
(Select Customer,
Tab,
tb,
Rank() over (partition by Customer order by Cl1 desc) rk1,
Rank() over (partition by Customer order by Cl2 desc) rk2
From TableA) X
Where X.rk1 = 1 and X.rk2 = 1
Tab Tb Customer
A45845 100052 Shin
A45845 100053 Shin
A45845 100054 Reek
The table will always have value (no nulls or blank value) for both Tab and Tb columns. Tab is not unique to a particular customer. Tb is a sequential and continuously increasing integer with no duplicates possible (unique). The latest Tab value for a customer will also have the most recent Tb as well.
Though the results are the same, is there something I may not be considering when changing the query in this case?
Edit: Fixed errors on second query when building example and not using real column or table names. Also explanded on scenario. My apologies about the updated info and fix in original post, was called before I even had a chance to double check it.
Seriously doubt Rank is going to be faster.
Where you would need rank is if you wanted the value of CL2 on the row where CL1 is max.
Do you have indexes on Customer, CL1, and CL2?
Check fragmentation.
Check the execution plans.
And no way those are returning the same results.
This would be the equiv query, but I doubt it would be more efficient. #Damien_The_Unbeliever please let me know if I'm wrong again. (Distinct added for the scenario where there are multiple Cl1 and Cl2 rows with the same value. This can be removed if the primary key is across customer, Cl1 and Cl2.)
Select Distinct Customer,
X.Cl1 as Tab,
X2.Cl2 as tb
From (Select Customer,
Rank() over (partition by Customer order by Cl1 desc) rk1,
Cl1
From TableA) X
Join (Select Customer,
Rank() over (partition by Customer order by Cl2 desc) rk2,
Cl2
From TableA) X2
On X.Customer = X2.Customer
Where X.rk1 = 1
And X2.rk2 = 1
The first query returns values for Tab and Tb, the second always returns 1 and 1 for the two columns (because of the where clause).
The second will only return customers where there is a row that is the first by both CL1 and Cl2. The first returns a row for all customers.
The second will return duplicates, when multiple rows satisfy the two ordering conditions for a given customer. The first returns only one row per customer.
The second has a syntax error, so it will not run (the comma after rk2). The first seems to be valid SQL syntax.
The first seems simpler and more understandable. However, I think my first point is the biggest difference. (I'm ignoring the fact that the columns are in different orders.)