Left Outer Join in SQL Server 2014 - sql

We are currently upgrading to SQL Server 2014; I have a join that runs fine in SQL Server 2008 R2 but returns duplicates in SQL Server 2014. The issue appears to be with the predicate AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO for if I change it to anything but 4, I do not get the duplicates. The query is returning those values in Accounting Period 4 twice. This query gets account balances for all the previous Accounting Periods so in this case it returns values for Accounting Periods 0, 1, 2 and 3 correctly but then duplicates the values from Period 4.
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
LEFT JOIN PS_GL_ACCOUNT_TBL B
ON B.SETID = 'LTSHR'
LEFT OUTER JOIN PS_LEDGER L2
ON A.BUSINESS_UNIT = L2.BUSINESS_UNIT
AND A.LEDGER = L2.LEDGER
AND A.ACCOUNT = L2.ACCOUNT
AND A.ALTACCT = L2.ALTACCT
AND A.DEPTID = L2.DEPTID
AND A.PROJECT_ID = L2.PROJECT_ID
AND A.DATE_CODE = L2.DATE_CODE
AND A.BOOK_CODE = L2.BOOK_CODE
AND A.GL_ADJUST_TYPE = L2.GL_ADJUST_TYPE
AND A.CURRENCY_CD = L2.CURRENCY_CD
AND A.STATISTICS_CODE = L2.STATISTICS_CODE
AND A.FISCAL_YEAR = L2.FISCAL_YEAR
AND A.ACCOUNTING_PERIOD = L2.ACCOUNTING_PERIOD
AND L2.ACCOUNTING_PERIOD = RG.PERIOD_TO
WHERE
A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND ( (A.ACCOUNTING_PERIOD BETWEEN 1 and 4
AND B.ACCOUNT_TYPE IN ('E','R') )
OR
(A.ACCOUNTING_PERIOD BETWEEN 0 and 4
AND B.ACCOUNT_TYPE IN ('A','L','Q') ) )
AND A.STATISTICS_CODE = ' '
AND A.ACCOUNT = '21101'
AND A.CURRENCY_CD <> ' '
AND A.CURRENCY_CD = 'GBP'
AND B.SETID='LTSHR'
AND B.ACCOUNT=A.ACCOUNT
AND B.SETID = SETID
AND B.EFFDT=(SELECT MAX(EFFDT) FROM PS_GL_ACCOUNT_TBL WHERE SETID='LTSHR' AND WHERE ACCOUNT=B.ACCOUNT AND EFFDT<='2015-01-31 00:00:00.000')
GROUP BY A.ACCOUNT
ORDER BY A.ACCOUNT

I'm inclined to suspect that you have simplified your original query too much to reflect the real problem, but I'm going to answer the question as posed, in light of the comments on it to this point.
Since your query does not in fact select anything derived from table L2, nor do any other predicates rely on anything from that table, the only thing accomplished by (left) joining it is to duplicate rows of the pre-aggregation results where more than one satisfies the join condition for the same L2 row. That seems unlikely to be what you want, especially with that particular join being a self join, so I don't see any reason not to remove it altogether. Dollars to doughnuts, that solves the duplication problem.
I'm also going to suggest removing the correlated subquery in the WHERE clause in favor of joining an inline view, since you already join the base table for the subquery anyway. This particular inline view uses the window function version of MAX() instead of the aggregate function version. Ideally, it would directly select only the rows with the target EFFDT values, but it cannot do so without being rather more complicated, which is exactly what I am trying to avoid. The resulting query therefore filters EFFDT externally, as the original did, but without a correlated subquery.
I furthermore removed a few redundant predicates and rewrote one of the messier ones to a somewhat nicer equivalent. And I reordered the predicates in a way that seems more logical to me.
Additionally, since you are filtering on a specific value of A.ACCOUNT, it is pointless (but not wrong) to GROUP BY or ORDER_BY that column. Accordingly, I have removed those clauses to make the query simpler and clearer.
Here's what I came up with:
SELECT
A.ACCOUNT,
SUM(A.POSTED_TRAN_AMT),
SUM(A.POSTED_BASE_AMT),
SUM(A.POSTED_TOTAL_AMT)
FROM
PS_LEDGER A
INNER JOIN (
SELECT
*,
MAX(EFFDT) OVER (PARTITION BY ACCOUNT) AS MAX_EFFDT
FROM PS_GL_ACCOUNT_TBL
WHERE
EFFDT <= '2015-01-31 00:00:00.000'
AND SETID = 'LTSHR'
) B
ON B.ACCOUNT=A.ACCOUNT
WHERE
A.ACCOUNT = '21101'
AND A.BUSINESS_UNIT = 'UK001'
AND A.LEDGER = 'LOCAL'
AND A.FISCAL_YEAR = 2015
AND A.CURRENCY_CD = 'GBP'
AND A.STATISTICS_CODE = ' '
AND B.EFFDT = B.MAX_EFFDT
AND CASE
WHEN B.ACCOUNT_TYPE IN ('E','R')
THEN A.ACCOUNTING_PERIOD BETWEEN 1 and 4
WHEN B.ACCOUNT_TYPE IN ('A','L','Q')
THEN A.ACCOUNTING_PERIOD BETWEEN 0 and 4
ELSE 0
END

Related

sql counting the number is not working correctly

I make related queries and the counting does not work correctly, when I connect 4 and join and add a condition, it does not count correctly, but without the 4th joina and the condition it works correctly. first option result = 2
SELECT
pxixolog_details.*,
directions.direction,
COUNT(directions.direction) procent
FROM
pxixolog_details
LEFT JOIN psixologs_direction ON pxixolog_details.id = psixologs_direction.psixolog_id
LEFT JOIN directions ON directions.id = psixologs_direction.direction_id
LEFT JOIN psixologs_weeks ON pxixolog_details.id = psixologs_weeks.psixolog_id
WHERE
directions.direction IN(
'Трудности в отношениях',
'Проблемы со сном',
'Нежелательная агрессия'
)
AND birthday BETWEEN '1956-04-29' AND '2021-04-29' AND psixologs_weeks.week = '4'
GROUP BY
pxixolog_details.id
and the second one doesn't work correctly. result = 4
SELECT
pxixolog_details.*,
directions.direction,
COUNT(directions.direction) procent
FROM
pxixolog_details
LEFT JOIN psixologs_direction ON pxixolog_details.id = psixologs_direction.psixolog_id
LEFT JOIN directions ON directions.id = psixologs_direction.direction_id
LEFT JOIN psixologs_weeks ON pxixolog_details.id = psixologs_weeks.psixolog_id
LEFT JOIN psixologs_times ON pxixolog_details.id = psixologs_times.psixolog_id
WHERE
directions.direction IN(
'Трудности в отношениях',
'Проблемы со сном',
'Нежелательная агрессия'
)
AND birthday BETWEEN '1956-04-29' AND '2021-04-29' AND psixologs_weeks.week = '4'
AND (psixologs_times.time = '09:00' OR psixologs_times.time = '10:00')
GROUP BY
pxixolog_details.id
what am I doing wrong?
You get double the amount of results when doing 4 JOINs because through the new (4th) JOIN you allow 2 records (9:00 and 10:00 o'clock) for each of the other joined records in the first 3 JOINs. That can lead to the observed result.
Check your data and make sure that your 4th JOIN condition yields a 1:1 record matching with the other data.
The last table has psixologs_times matches multiple rows for each psixolog_id.
You can easily see this using a query:
select psixolog_id, count(*)
from psixologs_times
group by psixolog_id
having count(*) > 1;
How you fix this problem depends on what you want to do. The simplest solution is to use count(distinct):
COUNT(DISTINCT directions.direction) as procent
However, this might just be hiding the problem. You might want to choose one row from the psixologs_times table. Or pre-aggregate it. Or do something else.

Need help in optimizing sql query

I am new to sql and have created the below sql to fetch the required results.However the query seems to take ages in running and is quite slow. It will be great if any help in optimization is provided.
Below is the sql query i am using:
SELECT
Date_trunc('week',a.pair_date) as pair_week,
a.used_code,
a.used_name,
b.line,
b.channel,
count(
case when b.sku = c.sku then used_code else null end
)
from
a
left join b on a.ma_number = b.ma_number
and (a.imei = b.set_id or a.imei = b.repair_imei
)
left join c on a.used_code = c.code
group by 1,2,3,4,5
I would rewrite the query as:
select Date_trunc('week',a.pair_date) as pair_week,
a.used_code, a.used_name, b.line, b.channel,
count(*) filter (where b.sku = c.sku)
from a left join
b
on a.ma_number = b.ma_number and
a.imei in ( b.set_id, b.repair_imei ) left join
c
on a.used_code = c.code
group by 1,2,3,4,5;
For this query, you want indexes on b(ma_number, set_id, repair_imei) and c(code, sku). However, this doesn't leave much scope for optimization.
There might be some other possibilities, depending on the tables. For instance, or/in in the on clause is usually a bad sign -- but it is unclear what your intention really is.

Using a Correlated Subquery within a Left Join

I need a fresh set of eyes on this query. Without getting mega in depth in this code my problem is I'm doing a left join to pull from the TXP_Digital_Signatures (tds) table which stores signatures to the most current version of Treatment Plans (txp_master txp). What this code is doing is bringing back results where tds.signed is null (no signature) or marked N (No). This works, but what this report has done is show people what No's need to become yes, but that is leaving the No left behind, so if there is a more recent Yes then the No in that version of the tds.plan_id it is still pulling that plan_id where I no longer want it where the most recent signature status is a Y (yes), etc. The code snippet below added to the where statement works, but it hides all No's even if there is not a newer Y (yes).
tds.date = (select Max(date) from TXP_Digital_Signatures where tds.plan_id = txp.plan_id)
Can anyone think of a way to either add a correlated subquery to the left join, so it only pulls the max(tds.date) for each tds.plan_id or how to rework my where statements so the no's without a newer yes and the null's still show up. I really don't want to redo the entire report as a grouped report if I can help it where I feel it'll break a ton of stuff on me and basically have me redoing this report from scratch. SQL 2008 R2
SELECT case_status,
CONVERT(CHAR(10), episode_open_date, 101)AS 'Enrolled' ,
txp.patient_id,
p.lname+', ' + p.fname AS 'Client',
CONVERT(CHAR(10), txp.effective_date, 101)AS 'Effective',
CONVERT(CHAR(10), next_review_date, 101)AS 'Review',
txp.signed,
(SELECT location_code FROM staff s WHERE s.staff_id = txp_coordinator_id) AS 'Clinic',
(SELECT s.lname+', ' +s.fname FROM staff s WHERE s.staff_id = txp_coordinator_id) AS 'Coordinator',
(SELECT s.lname+', ' +s.fname FROM staff s WHERE s.staff_id = ts.team_member_id ) AS 'Team',
ts.signed,
tds.signed as 'Patient Sig'
FROM txp_master txp join patient p ON p.patient_id = txp.patient_id and p.episode_id = txp.episode_id
join txp_signature ts on ts.plan_id = txp.plan_id and ts.version_no = txp.version_no and ts.team_member_id <> txp.txp_coordinator_id
left join TXP_Digital_Signatures tds on tds.plan_id = txp.plan_id
where p.case_status = 'A' and
txp.status <> 'er' and patient_signed_date is null
and tds.signed is null or tds.signed = 'N'
and txp.effective_date > '2016-12-31 00:00:00.000'
and tds.date = (select Max(date) from TXP_Digital_Signatures where tds.plan_id = txp.plan_id)
order by patient_id
Your query should work if you correct the sub-query, like this:
(select Max(date) from TXP_Digital_Signatures x where x.plan_id = tds.plan_id)
currently, you are not filtering the sub-query TXP_Digital_Signatures.
One other thing to take note of is that you have a LEFT JOIN on TXP_Digital_Signatures tds yet you include it on the WHERE clause. This will convert it to an INNER JOIN. So decide on what join you require and change accordingly.
If you want results regardless of TXP_Digital_Signatures tds then move those conditions to the ON clause.
If you only want results based on TXP_Digital_Signatures tds then change to INNER JOIN

Refactoring slow SQL query

I currently have this very very slow query:
SELECT generators.id AS generator_id, COUNT(*) AS cnt
FROM generator_rows
JOIN generators ON generators.id = generator_rows.generator_id
WHERE
generators.id IN (SELECT "generators"."id" FROM "generators" WHERE "generators"."client_id" = 5212 AND ("generators"."state" IN ('enabled'))) AND
(
generators.single_use = 'f' OR generators.single_use IS NULL OR
generator_rows.id NOT IN (SELECT run_generator_rows.generator_row_id FROM run_generator_rows)
)
GROUP BY generators.id;
An I'm trying to refactor it/improve it with this query:
SELECT g.id AS generator_id, COUNT(*) AS cnt
from generator_rows gr
join generators g on g.id = gr.generator_id
join lateral(select case when exists(select * from run_generator_rows rgr where rgr.generator_row_id = gr.id) then 0 else 1 end as noRows) has on true
where g.client_id = 5212 and "g"."state" IN ('enabled') AND
(g.single_use = 'f' OR g.single_use IS NULL OR has.norows = 1)
group by g.id
For reason it doesn't quite work as expected(It returns 0 rows). I think I'm pretty close to the end result but can't get it to work.
I'm running on PostgreSQL 9.6.1.
This appears to be the query, formatted so I can read it:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id
WHERE gr.generators_id IN (SELECT g.id
FROM generators g
WHERE g.client_id = 5212 AND
g.state = 'enabled'
) AND
(g.single_use = 'f' OR
g.single_use IS NULL OR
gr.id NOT IN (SELECT rgr.generator_row_id FROM run_generator_rows rgr)
)
GROUP BY gr.generators_id;
I would be inclined to do most of this work in the FROM clause:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id JOIN
generators gg
on g.id = gg.id AND
gg.client_id = 5212 AND gg.state = 'enabled' LEFT JOIN
run_generator_rows rgr
ON g.id = rgr.generator_row_id
WHERE g.single_use = 'f' OR
g.single_use IS NULL OR
rgr.generator_row_id IS NULL
GROUP BY gr.generators_id;
This does make two assumptions that I think are reasonable:
generators.id is unique
run_generator_rows.generator_row_id is unique
(It is easy to avoid these assumptions, but the duplicate elimination is more work.)
Then, some indexes could help:
generators(client_id, state, id)
run_generator_rows(id)
generator_rows(generators_id)
Generally avoid inner selects as in
WHERE ... IN (SELECT ...)
as they are usually slow.
As it was already shown for your problem it's a good idea to think of SQL as of set- theory.
You do NOT join tables on their sole identity:
In fact you take (SQL does take) the set (- that is: all rows) of the first table and "multiply" it with the set of the second table - thus ending up with n times m rows.
Then the ON- clause is used to (often strongly) reduce the result by simply selecting each one of those many combinations by evaluating this portion to either true (take) or false (drop). This way you can chose any arbitrary logic to select those combinations in favor.
Things get trickier with LEFT JOIN and RIGHT JOIN, but one can easily think of them as to take one side for granted:
output the combinations of that row IF the logic yields true (once at least) - exactly like JOIN does
output exactly ONE row, with 'the other side' (right side on LEFT JOIN and vice versa) consisting of ALL NULL for every column.
Count(*) is great either, but if things getting complicated don't stick to it: Use Sub- Selects for the keys only, and once all the hard word is done join the Fun- Stuff to it. Like in
SELECT SUM(VALID), ID
FROM SELECT
(
(1 IF X 0 ELSE) AS VALID, ID
FROM ...
)
GROUP BY ID) AS sub
JOIN ... AS details ON sub.id = details.id
Difference is: The inner query is executed only once. The outer query does usually have no indices left to work with and will be slow, but if the inner select here doesn't make the data explode this is usually many times faster than SELECT ... WHERE ... IN (SELECT..) constructs.

Select Case value not sorting correctly in order by

I have a result set that has a defined grouping (for a report). There is a possibility that the position is not assigned and therefore does not have a "Grid_Group". In this case I assign a value of 99. This is working correctly except the order by, the 99 is always first and it should be last (If I do desc it is at bottom). I've tried a cast on the select side as well as within the order by on the Grid_Group, but both have same results with 99 at the top. (Sql server 2008)
Here is the snippet, I remove all other unneeded columns.
SELECT s.SessionNumber,Position.PositionName,(Select CASE when dbo.Position.Grid_Group is null THEN 99 ELSE dbo.Position.Grid_Group END) as Grid_Group
FROM dbo.USession AS us Left Outer JOIN
dbo.Position ON us.PositionId = dbo.Position.PositionId FULL OUTER JOIN
dbo.Sessions AS s ON us.SessionId = s.SessionId
ORDER BY S.SessionNumber, dbo.Position.Grid_Group
Thoughts?
You need to apply the CASE on your order by as well (bear in mind this will ruin index utilization on the sort operation). Your ORDER BY is referencing the original table's column, not your alias result column. Something like this should do the trick:
SELECT
s.SessionNumber,Position.PositionName,
(CASE
WHEN dbo.Position.Grid_Group IS NULL THEN 99
ELSE dbo.Position.Grid_Group
END) AS Grid_Group
FROM dbo.USession AS us
LEFT OUTER JOIN dbo.Position ON us.PositionId = dbo.Position.PositionId
FULL OUTER JOIN dbo.Sessions AS s ON us.SessionId = s.SessionId
ORDER BY
S.SessionNumber,
(CASE
WHEN dbo.Position.Grid_Group IS NULL THEN 99
ELSE dbo.Position.Grid_Group
END)
I allowed myself to apply some minor formatting of your SQL.
An ORDER BY will in this case not see your calculated columns. To get the effect you're asking for, you'll have to ORDER BY the same expression (which kastermester's answer is demonstrating), or wrap the query in a common table expression and ORDER BY while selecting from that, something like;
WITH cte AS (
SELECT s.SessionNumber, p.PositionName, COALESCE(p.Grid_Group, 99) Grid_Group
FROM dbo.USession AS us
LEFT OUTER JOIN dbo.Position p ON us.PositionId = p.PositionId
FULL OUTER JOIN dbo.Sessions s ON us.SessionId = s.SessionId
)
SELECT * FROM cte ORDER BY SessionNumber, Grid_Group;