Using a Correlated Subquery within a Left Join - sql

I need a fresh set of eyes on this query. Without getting mega in depth in this code my problem is I'm doing a left join to pull from the TXP_Digital_Signatures (tds) table which stores signatures to the most current version of Treatment Plans (txp_master txp). What this code is doing is bringing back results where tds.signed is null (no signature) or marked N (No). This works, but what this report has done is show people what No's need to become yes, but that is leaving the No left behind, so if there is a more recent Yes then the No in that version of the tds.plan_id it is still pulling that plan_id where I no longer want it where the most recent signature status is a Y (yes), etc. The code snippet below added to the where statement works, but it hides all No's even if there is not a newer Y (yes).
tds.date = (select Max(date) from TXP_Digital_Signatures where tds.plan_id = txp.plan_id)
Can anyone think of a way to either add a correlated subquery to the left join, so it only pulls the max(tds.date) for each tds.plan_id or how to rework my where statements so the no's without a newer yes and the null's still show up. I really don't want to redo the entire report as a grouped report if I can help it where I feel it'll break a ton of stuff on me and basically have me redoing this report from scratch. SQL 2008 R2
SELECT case_status,
CONVERT(CHAR(10), episode_open_date, 101)AS 'Enrolled' ,
txp.patient_id,
p.lname+', ' + p.fname AS 'Client',
CONVERT(CHAR(10), txp.effective_date, 101)AS 'Effective',
CONVERT(CHAR(10), next_review_date, 101)AS 'Review',
txp.signed,
(SELECT location_code FROM staff s WHERE s.staff_id = txp_coordinator_id) AS 'Clinic',
(SELECT s.lname+', ' +s.fname FROM staff s WHERE s.staff_id = txp_coordinator_id) AS 'Coordinator',
(SELECT s.lname+', ' +s.fname FROM staff s WHERE s.staff_id = ts.team_member_id ) AS 'Team',
ts.signed,
tds.signed as 'Patient Sig'
FROM txp_master txp join patient p ON p.patient_id = txp.patient_id and p.episode_id = txp.episode_id
join txp_signature ts on ts.plan_id = txp.plan_id and ts.version_no = txp.version_no and ts.team_member_id <> txp.txp_coordinator_id
left join TXP_Digital_Signatures tds on tds.plan_id = txp.plan_id
where p.case_status = 'A' and
txp.status <> 'er' and patient_signed_date is null
and tds.signed is null or tds.signed = 'N'
and txp.effective_date > '2016-12-31 00:00:00.000'
and tds.date = (select Max(date) from TXP_Digital_Signatures where tds.plan_id = txp.plan_id)
order by patient_id

Your query should work if you correct the sub-query, like this:
(select Max(date) from TXP_Digital_Signatures x where x.plan_id = tds.plan_id)
currently, you are not filtering the sub-query TXP_Digital_Signatures.
One other thing to take note of is that you have a LEFT JOIN on TXP_Digital_Signatures tds yet you include it on the WHERE clause. This will convert it to an INNER JOIN. So decide on what join you require and change accordingly.
If you want results regardless of TXP_Digital_Signatures tds then move those conditions to the ON clause.
If you only want results based on TXP_Digital_Signatures tds then change to INNER JOIN

Related

sql counting the number is not working correctly

I make related queries and the counting does not work correctly, when I connect 4 and join and add a condition, it does not count correctly, but without the 4th joina and the condition it works correctly. first option result = 2
SELECT
pxixolog_details.*,
directions.direction,
COUNT(directions.direction) procent
FROM
pxixolog_details
LEFT JOIN psixologs_direction ON pxixolog_details.id = psixologs_direction.psixolog_id
LEFT JOIN directions ON directions.id = psixologs_direction.direction_id
LEFT JOIN psixologs_weeks ON pxixolog_details.id = psixologs_weeks.psixolog_id
WHERE
directions.direction IN(
'Трудности в отношениях',
'Проблемы со сном',
'Нежелательная агрессия'
)
AND birthday BETWEEN '1956-04-29' AND '2021-04-29' AND psixologs_weeks.week = '4'
GROUP BY
pxixolog_details.id
and the second one doesn't work correctly. result = 4
SELECT
pxixolog_details.*,
directions.direction,
COUNT(directions.direction) procent
FROM
pxixolog_details
LEFT JOIN psixologs_direction ON pxixolog_details.id = psixologs_direction.psixolog_id
LEFT JOIN directions ON directions.id = psixologs_direction.direction_id
LEFT JOIN psixologs_weeks ON pxixolog_details.id = psixologs_weeks.psixolog_id
LEFT JOIN psixologs_times ON pxixolog_details.id = psixologs_times.psixolog_id
WHERE
directions.direction IN(
'Трудности в отношениях',
'Проблемы со сном',
'Нежелательная агрессия'
)
AND birthday BETWEEN '1956-04-29' AND '2021-04-29' AND psixologs_weeks.week = '4'
AND (psixologs_times.time = '09:00' OR psixologs_times.time = '10:00')
GROUP BY
pxixolog_details.id
what am I doing wrong?
You get double the amount of results when doing 4 JOINs because through the new (4th) JOIN you allow 2 records (9:00 and 10:00 o'clock) for each of the other joined records in the first 3 JOINs. That can lead to the observed result.
Check your data and make sure that your 4th JOIN condition yields a 1:1 record matching with the other data.
The last table has psixologs_times matches multiple rows for each psixolog_id.
You can easily see this using a query:
select psixolog_id, count(*)
from psixologs_times
group by psixolog_id
having count(*) > 1;
How you fix this problem depends on what you want to do. The simplest solution is to use count(distinct):
COUNT(DISTINCT directions.direction) as procent
However, this might just be hiding the problem. You might want to choose one row from the psixologs_times table. Or pre-aggregate it. Or do something else.

Removing Duplicate Rows in SQL

I have a query that returns a list of devices that have multiple "moved" dates. I only want the oldest date entry. I used the MIN function to give me the oldest date, but I'm still getting multiple entries (I am, however, getting less than before). I tried to get a more precise JOIN, but I couldn't narrow the fields down any more.
If you'll look at the screenshot, the first three rows have the same "wonum" but three different "Moved Dates." I am thinking that if I can somehow take the oldest "Moved Date" out of those three and remove the other rows, that would give me the result I'm looking for. However, I'm not skilled enough to do that (I've only been working in SQL for a few months now). Would that work, or is there a better way to narrow down my results? I'm wondering if I need to perform some kind of sub-query to get what I need.
I've looked around but can't find anything that allows me to remove a row of data the way I'm looking to. Nor can I seem to find a reason my MIN function isn't paring down the data anymore than it is. Below is the code I'm currently using. Thanks for any help that can be given.
SELECT wo.wonum, wo.location, wo.statusdate, wo.status, l.subcontractor,
wo.description, MIN(ast.datemoved) AS 'Moved Date'
FROM workorder wo
JOIN locations l ON wo.location = l.location
JOIN asset a ON wo.location = a.location
-- AND wo.assetnum = a.assetnum
JOIN assettrans ast ON a.assetnum = ast.assetnum
-- AND a.assetid = ast.assetid
WHERE wo.description LIKE '%deteriorating%'
AND wo.status != 'close'
GROUP BY wo.wonum, wo.location, wo.statusdate,
wo.status, l.subcontractor, wo.description
ORDER BY wo.wonum;
DBV SQL Query Result
Update: Table Data
You need to do the grouping in your join statement inside a subquery(not tested, but you'll get the idea):
Replace
JOIN assettrans ast ON a.assetnum = ast.assetnum
With
inner join
(
select ast.assetnum,MIN(ast.datemoved) AS 'Moved Date'
from assettrans ast
group by ast.assetnum
) grouped
on a.assetnum = grouped.assetnum
So the full query looks like:
SELECT wo.wonum, wo.location, wo.statusdate, wo.status, l.subcontractor,
wo.description, grouped.MovedDate
FROM workorder wo
JOIN locations l ON wo.location = l.location
JOIN asset a ON wo.location = a.location
INNER JOIN
(
select ast.assetnum,MIN(ast.datemoved) AS MovedDate
from assettrans ast
group by ast.assetnum
) grouped
on a.assetnum = grouped.assetnum
WHERE wo.description LIKE '%deteriorating%'
AND wo.status != 'close'
ORDER BY wo.wonum;
Please test before using in production
--if you have id column and leave the oldest record
delete from T1 from MyTable T1, MyTable T2
where T1.dupField = T2.dupField (and add more filters if applies)
and
T1.uniqueField > T2.uniqueField
--if you want to delete the new "Moved Dates" and leave the oldest one
delete from T1 from MyTable T1, MyTable T2
where T1.dupField = T2.dupField (and add more filters if applies)
and
T1.Moved Dates > T2.Moved Dates

Matching values from two columns with the values of two columns in a sub-query

I have a problem I can't really figure out, even though I thought I had the solution.
I think this is DB2 SQL by the way.
I have a customer number and a country code (extracted from a string using SUBSTR) which I don't want to find in combination in a subquery, like so:
SELECT ku.orgnr AS customer ,
Substr(bu.bank_account_swiftadr,5,2) AS country
FROM db811.bet_utl bu
LEFT JOIN db811.henv_utl bh
ON bu.betaling_urn = bh.betaling_urn
LEFT JOIN db811.betaling_status bs
ON bu.betaling_status = bs.betaling_status
LEFT JOIN db811.kunde_orgnr ku
ON bu.kundenr = ku.kundenr
WHERE bu.kanal = 'N'
AND (ku.orgnr, Substr(bu.bank_account_swiftadr,5,2)) ;
Not in the results below
SELECT ku.orgnr AS customer ,
Substr(bu.bank_account_swiftadr,5,2) AS country ,
COUNT(*) AS numberof
FROM db811.bet_utl_hist bu
LEFT JOIN db811.kunde_orgnr ku
ON bu.kundenr = ku.kundenr
WHERE
and bu.kanal = 'N'
AND bu.betalingsdato > '2016-01-01'
GROUP BY ku.orgnr ,
substr(bu.bank_account_swiftadr,5,2);
This should work I though, but it seems to match on just one of them, and I need both to be true in order for me to exclude it with the NOT IN.
I assume I am missing something basic since I am quite new at this.

SQL Coding, Unable to Limit based on where in subquery

I have a query I'm writing, and it's nearly perfect, excepting one error I can't seem to control for.
SELECT Clients.client_id,
Clients.last_name + ', ' + Clients.first_name AS Client_Name,
Query3.team_id,
Query3.team_name,
Query2.CSW_Name,
Query1.Date_of_Service AS Last_Service_by_CSW,
Query.Tx_Start_Date AS Tx_Start_Date,
Query.Tx_End_Date AS Tx_End_Date
FROM Clients
LEFT OUTER JOIN
(SELECT TxPlus.client_id,
Max(TxPlus.start_date) AS Tx_Start_Date,
Max(TxPlus.end_date) AS Tx_End_Date
FROM TxPlus
GROUP BY TxPlus.client_id) Query ON Query.client_id = Clients.client_id
LEFT OUTER JOIN
(SELECT EmployeeClients.client_id,
Employees.last_name + ', ' + Employees.first_name AS CSW_Name
FROM EmployeeClients
INNER JOIN Employees ON EmployeeClients.emp_id = Employees.emp_id
WHERE EmployeeClients.case_manager = 1) Query2 ON Clients.client_id = Query2.client_id
LEFT OUTER JOIN
(SELECT ClientVisit.client_id,
Max(ClientVisit.rev_timeout) AS Date_of_Service
FROM ClientVisit
INNER JOIN EmployeeClients ON ClientVisit.emp_id = EmployeeClients.emp_id
GROUP BY ClientVisit.client_id,
EmployeeClients.case_manager,
ClientVisit.visittype_id
HAVING EmployeeClients.case_manager = 1
AND ClientVisit.visittype_id = 9) Query1 ON Clients.client_id = Query1.client_id
LEFT OUTER JOIN
(SELECT TeamClient.client_id,
Team.team_id,
Team.team_name
FROM TeamClient
INNER JOIN Team ON TeamClient.team_id = Team.team_id
WHERE TeamClient.primary_flag = 1) Query3 ON Clients.client_id = Query3.client_id
WHERE Clients.client_status LIKE 'Active'
The objective is to basically pull a client record, and show the dates for their Treatment Plan, their Primary Team Assignments, their case manager, and the most Community Support service there case manager provided for them.
Right now, with the code as it's written, I get all the correct information, EXCEPT the most recent Community Support date. This returns the most recent service done by ANYONE, not the Case Manager.
I'm certain its a very simple problem, but it's driving me up the wall.
You all are an invaluable resource, and I thank you in advance.
Well, without knowing what the data looks like, I can't be 100% certain that this is the answer, but I have a pretty good feeling about it. I believe that this is the query that is returning your, "Community Support Date". (It wasn't identified as clearly as I would have liked, but it looks to me like this is it.)
(SELECT ClientVisit.client_id,
Max(ClientVisit.rev_timeout) AS Date_of_Service
FROM ClientVisit
INNER JOIN EmployeeClients ON ClientVisit.emp_id = EmployeeClients.emp_id
GROUP BY ClientVisit.client_id,
EmployeeClients.case_manager,
ClientVisit.visittype_id
HAVING EmployeeClients.case_manager = 1
AND ClientVisit.visittype_id = 9) Query1 ON Clients.client_id = Query1.client_id
So what you're doing is creating a subset of the data in these two tables, joined by emp_id and grouped by client, case manager and visit type. Then you remove any groups that are not the case manager or visit type you want. But by this point, you've already selected your Max(ClientVisit.rev_timeout) AS Date_of_Service. I believe that if you change your HAVING clause into a WHERE clause (and thus filter before you group) you will get the results you're looking for.
EDIT:
(SELECT ClientVisit.client_id,
Max(ClientVisit.rev_timeout) AS Date_of_Service
FROM ClientVisit
INNER JOIN EmployeeClients ON ClientVisit.emp_id = EmployeeClients.emp_id
WHERE EmployeeClients.case_manager = 1
AND ClientVisit.visittype_id = 9
GROUP BY ClientVisit.client_id) Query1 ON Clients.client_id = Query1.client_id

Left Outer Join in SQL Server 2014

We are currently upgrading to SQL Server 2014; I have a join that runs fine in SQL Server 2008 R2 but returns duplicates in SQL Server 2014. The issue appears to be with the predicate AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO for if I change it to anything but 4, I do not get the duplicates. The query is returning those values in Accounting Period 4 twice. This query gets account balances for all the previous Accounting Periods so in this case it returns values for Accounting Periods 0, 1, 2 and 3 correctly but then duplicates the values from Period 4.
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
LEFT JOIN PS_GL_ACCOUNT_TBL B
ON B.SETID = 'LTSHR'
LEFT OUTER JOIN PS_LEDGER L2
ON A.BUSINESS_UNIT = L2.BUSINESS_UNIT
AND A.LEDGER = L2.LEDGER
AND A.ACCOUNT = L2.ACCOUNT
AND A.ALTACCT = L2.ALTACCT
AND A.DEPTID = L2.DEPTID
AND A.PROJECT_ID = L2.PROJECT_ID
AND A.DATE_CODE = L2.DATE_CODE
AND A.BOOK_CODE = L2.BOOK_CODE
AND A.GL_ADJUST_TYPE = L2.GL_ADJUST_TYPE
AND A.CURRENCY_CD = L2.CURRENCY_CD
AND A.STATISTICS_CODE = L2.STATISTICS_CODE
AND A.FISCAL_YEAR = L2.FISCAL_YEAR
AND A.ACCOUNTING_PERIOD = L2.ACCOUNTING_PERIOD
AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO
WHERE
A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND ( (A.ACCOUNTING_PERIOD BETWEEN 1 and 4
AND B.ACCOUNT_TYPE IN ('E','R') )
OR
(A.ACCOUNTING_PERIOD BETWEEN 0 and 4
AND B.ACCOUNT_TYPE IN ('A','L','Q') ) )
AND A.STATISTICS_CODE = ' '
AND A.ACCOUNT = '21101'
AND A.CURRENCY_CD <> ' '
AND A.CURRENCY_CD = 'GBP'
AND B.SETID='LTSHR'
AND B.ACCOUNT=A.ACCOUNT
AND B.SETID = SETID
AND B.EFFDT=(SELECT MAX(EFFDT) FROM PS_GL_ACCOUNT_TBL WHERE SETID='LTSHR' AND WHERE ACCOUNT=B.ACCOUNT AND EFFDT<='2015-01-31 00:00:00.000')
GROUP BY A.ACCOUNT
ORDER BY A.ACCOUNT
I'm inclined to suspect that you have simplified your original query too much to reflect the real problem, but I'm going to answer the question as posed, in light of the comments on it to this point.
Since your query does not in fact select anything derived from table L2, nor do any other predicates rely on anything from that table, the only thing accomplished by (left) joining it is to duplicate rows of the pre-aggregation results where more than one satisfies the join condition for the same L2 row. That seems unlikely to be what you want, especially with that particular join being a self join, so I don't see any reason not to remove it altogether. Dollars to doughnuts, that solves the duplication problem.
I'm also going to suggest removing the correlated subquery in the WHERE clause in favor of joining an inline view, since you already join the base table for the subquery anyway. This particular inline view uses the window function version of MAX() instead of the aggregate function version. Ideally, it would directly select only the rows with the target EFFDT values, but it cannot do so without being rather more complicated, which is exactly what I am trying to avoid. The resulting query therefore filters EFFDT externally, as the original did, but without a correlated subquery.
I furthermore removed a few redundant predicates and rewrote one of the messier ones to a somewhat nicer equivalent. And I reordered the predicates in a way that seems more logical to me.
Additionally, since you are filtering on a specific value of A.ACCOUNT, it is pointless (but not wrong) to GROUP BY or ORDER_BY that column. Accordingly, I have removed those clauses to make the query simpler and clearer.
Here's what I came up with:
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
INNER JOIN (
SELECT
*,
MAX(EFFDT) OVER (PARTITION BY ACCOUNT) AS MAX_EFFDT
FROM PS_GL_ACCOUNT_TBL
WHERE
EFFDT <= '2015-01-31 00:00:00.000'
AND SETID = 'LTSHR'
) B
ON B.ACCOUNT=A.ACCOUNT
WHERE
A.ACCOUNT = '21101'
AND A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND A.CURRENCY_CD = 'GBP'
AND A.STATISTICS_CODE = ' '
AND B.EFFDT = B.MAX_EFFDT
AND CASE
WHEN B.ACCOUNT_TYPE IN ('E','R')
THEN A.ACCOUNTING_PERIOD BETWEEN 1 and 4
WHEN B.ACCOUNT_TYPE IN ('A','L','Q')
THEN A.ACCOUNTING_PERIOD BETWEEN 0 and 4
ELSE 0
END