Convert subselect to a join - sql

I seem to understand that Join is preferred to sub-select.
I'm unable to see how to turn the 3 sub-selects to joins.
My sub-selects fetch the first row only
I'm perfectly willing to leave this alone if it is not offensive SQL.
This is my query, and yes, those really are the table and column names
select x1.*, x2.KTNR, x3.J6NQ
from
(select D0HONB as HONB, D0HHNB as HHNB,
(
select DHHHNB
from ECDHREP
where DHAOEQ = D0ATEQ and DHJRCD = D0KNCD
order by DHEJDT desc
FETCH FIRST 1 ROW ONLY
) as STC_HHNB,
(
select FIQ9NB
from DCFIREP
where FIQ7NB = D0Q7NB
AND FIBAEQ = D0ATEQ
and FISQCD = D0KNCD
and FIGZSZ in ('POS', 'ACT', 'MAN', 'HLD')
order by FIYCNB desc
FETCH FIRST 1 ROW ONLY
) as BL_Q9NB,
(
select AAKPNR
from C1AACPP
where AACEEQ = D0ATEQ and AARCCE = D0KNCD and AARDCE = D0KOCD
order by AAHMDT desc, AANENO desc
FETCH FIRST 1 ROW ONLY
) as NULL_KPNR
from ECD0REP
) as x1
left outer join (
select AAKPNR as null_kpnr, max(ABKTNR) as KTNR
from C1AACPP
left outer join C1ABCPP on AAKPNR = ABKPNR
group by AAKPNR
) as X2 on x1.NULL_KPNR = x2.null_KPNR
left outer join (
select ACKPNR as KPNR, count(*) as J6NQ
from C1ACCPP
WHERE ACJNDD = 'Y'
group by ACKPNR
) as X3 on x1.NULL_KPNR = x3.KPNR

You've got a combination of correlated subselects and nested table expressions (NTE).
Personally, I'd call it offensive if I had to maintain it. ;)
Consider common table expressions & joins...without your data and tabvle structure, I can't give you the real statement, but the general form would look like
with
STC_HHNB as (
select DHHHNB, DHAOEQ, DHJRCD, DHEJDT
from ECDHREP )
, BL_Q9NB as ( <....>
where FIGZSZ in ('POS', 'ACT', 'MAN', 'HLD'))
<...>
select <...>
from stc_hhb
join blq9nb on <...>
Two important reasons to favor CTE over NTE...the results of a CTE can be reused Also it's easy to build a statement with CTE's incrementally.
By re-used, I mean you can have
with
cte1 as (<...>)
, cte2 as (select <...> from cte1 join <...>)
, cte3 as (select <...> from cte1 join <...>)
, cte4 as (select <...> from cte2 join cte3 on <...>)
select * from cte4;
The optimizer can choose to build a temporary results set for cte1 and use it multiple times. From a building standpoint, you can see I'm builing on each preceding cte.
Here's a good article
https://www.mcpressonline.com/programming/sql/simplify-sql-qwithq-common-table-expressions
Edit
Let's dig into your first correlated sub-query.
select D0HONB as HONB, D0HHNB as HHNB,
(
select DHHHNB
from ECDHREP
where DHAOEQ = D0ATEQ and DHJRCD = D0KNCD
order by DHEJDT desc
FETCH FIRST 1 ROW ONLY
) as STC_HHNB
from ECD0REP
What you asking the DB to do is for every row read in ECD0REP, go out and get a row from ECDHREP. If you're unlucky, the DB will have to read lots of records in ECDHREP to find that one row. Generally, consider that with correlated sub-query the inner query would need to read every row. So if there's M rows in the outer and N rows in the inner...then you're looking at MxN rows being read.
I've seen this before, especially on the IBM i. As that's how an RPG developer would do it
read ECD0REP;
dow not %eof(ECD0REP);
//need to get DHHHNB from ECDHREP
chain (D0ATEQ, D0KNCD) ECDHREP;
STC_HHNB = DHHHNB;
read ECD0REP;
enddo;
But that's not the way to do it in SQL. SQL is (supposed to be) set based.
So what you need to do is think of how to select the set of records out of ECDHREP that will match up to the set of record you want from ECD0REP.
with cte1 as (
select DHHHNB, DHAOEQ, DHJRCD
from ECDHREP
)
select D0HONB as HONB
, D0HHNB as HHNB
, DHHHBN as STC_HHNB
from ECD0REP join cte1
on DHAOEQ = D0ATEQ and DHJRCD = D0KNCD
Now maybe that's not quite correct. Perhaps there's multiple rows in ECDHREP with the same values (DHAOEQ, DHJRCD); thus you needed the FETCH FIRST in your correlated sub-query. Fine you can focus on the CTE and figure out what needs to be done to get that 1 row you want. Perhaps MAX(DHHHNB) or MIN(DHHHNB) would work. If nothing else, you could use ROW_NUMBER() to pick out just one row...
with cte1 as (
select DHHHNB, DHAOEQ, DHJRCD
, row_number() over(partition by DHAOEQ, DHJRCD
order by DHAOEQ, DHJRCD)
as rowNbr
from ECDHREP
), cte2 as (
select DHHHNB, DHAOEQ, DHJRCD
from cte1
where rowNbr = 1
)
select D0HONB as HONB
, D0HHNB as HHNB
, DHHHBN as STC_HHNB
from ECD0REP join cte2
on DHAOEQ = D0ATEQ and DHJRCD = D0KNCD
Now you're dealing with sets of records, joining them together for your final results.
Worse case, the DB has to read M + N records.
It's not really about performance, it's about thinking in sets.
Sure with a simple statement using a correlated sub-query, the optimizer will probably be able to re-write it into a join.
But it's best to write the best code you can, rather then hope the optimizer can correct it.
I've seen and rewritten queries with 100's of correlated & regular sub-queries....in fact I've seen a query that had to be broken into 2 because there were two many sub-queries. The DB has a limit of 256 per statement.

I'm going to have to differ with Charles here if the FETCH FIRST 1 ROW ONLY clauses are necessary. In this case you likely can't pull those sub-selects out into a CTE because that CTE would only have a single row in it. I suspect you could pull the outer sub-select into a CTE, but you would still need the sub-selects in the CTE. Since there appears to be no sharing, I would call this personal preference. BTW, I don't think pulling the sub-selects into a join will work for you either, in this case, for the same reason.
What is the difference between a sub-select and a CTE?
with mycte as (
select field1, field2
from mytable
where somecondition = true)
select *
from mycte
vs.
select *
from (select field1, field2
from mytable
where somecondition = true) a
It's really just a personal preference, though depending on the specific requirements, a CTE can be used multiple times within the SQL statement, but a sub-select will be more correct in other cases like the FETCT FIRST clause in your question.
EDIT
Let's look at the first sub-query. With the appropriate index:
(
select DHHHNB
from ECDHREP
where DHAOEQ = D0ATEQ and DHJRCD = D0KNCD
order by DHEJDT desc
FETCH FIRST 1 ROW ONLY
) as STC_HHNB,
only has to read one record per row in the output set. I don't think that is terribly onerous. This is the same for the third correlated sub-query as well.
That index on the first correlated sub-query would be:
create index ECDHREP_X1
on ECDHREP (DHAOEQ, DHJRCD, DHEJDT);
The second correlated sub-query might need more than one read per row, just because of the IN predicate, but it is far from needing a full table scan.

Related

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Calculate MAX for every row in SQL

I have this tables:
Docenza(id, id_facolta, ..., orelez)
Facolta(id, ...)
and I want to obtain, for every facolta, only the id of Docenza who has done the maximum number of orelez and the number of orelez:
id_docenzaP facolta1 max(orelez)
id_docenzaQ facolta2 max(orelez)
...
id_docenzaZ facoltaN max(orelez)
how can I do this? This is what i do:
SELECT DISTINCT ... F.nome, SUM(orelez) AS oreTotali
FROM Docenza D
JOIN Facolta F ON F.id = D.id_facolta
GROUP BY F.nome
I obtain somethings like:
docenzaP facolta1 maxValueForidP
docenzaQ facolta1 maxValueForidQ
...
docenzaR facolta2 maxValueForidR
docenzaS facolta2 maxValueForidS
...
docenzaZ facoltaN maxValueForFacoltaN
How can I take only the max value for every facolta?
Presumably, you just want:
SELECT F.nome, sum(orelez) AS oreTotali
FROM Docenza D JOIN
Facolta F
ON F.id = D.id_facolta
GROUP BY F.nome;
I'm not sure what the SELECT DISTINCT is supposed to be doing. It is almost never used with GROUP BY. The . . . suggests that you are selecting additional columns, which are not needed for the results you want.
This is untested, and since you didn't provide sample data with expected results I can't be sure it's really what you need.
It's a bit ugly and I'm sure there is some clever correlated sub query approach, but I've never been good with those.
SELECT st.focolta,
s_orelez,
TMP3.id_docenza
FROM some_table AS st
INNER
JOIN (SELECT *
FROM (SELECT focolta,
s_orelez,
id_docenza,
ROW_NUMBER() OVER -- Get the ranking of the orelez sum by focolta.
( PARTITION BY focolta
ORDER BY s_orelez DESC
) rn_orelez
FROM (SELECT focolta,
id_docenza,
SUM(orelez) OVER -- Sum the orelez by focolta
( PARTITION BY focolta
) AS s_orelez
FROM some_table
) TMP
) TMP2
WHERE = TMP2.rn_orelez = 1 -- Limit to the highest rank value
) TMP3
ON some_table.focolta = TMP3.focolta; -- Join to focolta to the id associated with the hightest value.

SQL Subquery just return one value.How Can make this code efficient?

select count(*) as CountId, [FirstRouteNo],[ThroughRouteSid],[LastRouteNo],
(select top 1 [ThroughRouteJson]
from DirectTransfer as Subquery
where MainQuery.FirstRouteNo=Subquery.FirstRouteNo and
MainQuery.ThroughRouteSid = Subquery.ThroughRouteSid and
MainQuery.LastRouteNo = Subquery.LastRouteNo
) as DetailJson,
(select top 1 RouteMeter
from DirectTransfer as Subquery
where MainQuery.FirstRouteNo = Subquery.FirstRouteNo and
MainQuery.ThroughRouteSid = Subquery.ThroughRouteSid and
MainQuery.LastRouteNo = Subquery.LastRouteNo
) as RouteMeter
from DirectTransfer as MainQuery
group by MainQuery.[FirstRouteNo],MainQuery.[ThroughRouteSid],MainQuery.[LastRouteNo]
order by CountId desc
I want to group by this column [FirstRouteNo],[ThroughRouteSid],[LastRouteNo] then Count How many records.but I also want to show two column values like [ThroughRouteJson] and [RouteMeter] any one of records.Because [ThroughRouteJson] and [RouteMeter] of value has little different.So I can't group by with them.then subquery only return one value.So I write two Subquery to get what I want.because my DB table has More than 100 million records.I want to make efficient. How can I make this code become more efficient then I can get the same result data?
I would suggest you do this as:
select dt.*, dt2.DetailJson, dt2.RouteMeter
from (select count(*) as cnt, dt.FirstRouteNo, dt.ThroughRouteSid, dt.LastRouteNo
from DirectTransfer dt
group by dt.FirstRouteNo, dt.ThroughRouteSid, dt.LastRouteNo
) dt outer apply
(select top 1 ThroughRouteJson as DetailJson, RouteMeter
from DirectTransfer dt2
where dt.FirstRouteNo = dt.FirstRouteNo and
dt.ThroughRouteSid = dt.ThroughRouteSid and
dt.LastRouteNo = dt.LastRouteNo
) dt2
order by CountId desc;
You want indexes on DirectTransfer(FirstRouteNo, ThroughRouteSid, LastRouteNo). There might be other ways to accomplish what you want, but it is a bit unclear what you are trying to do.

Ordering a SQL query based on the value in a column determining the value of another column in the next row

My table looks like this:
Value Previous Next
37 NULL 42
42 37 3
3 42 79
79 3 NULL
Except, that the table is all out of order. (There are no duplicates, so that is not an issue.) I was wondering if there was any way to make a query that would order the output, basically saying "Next row 'value' = this row 'next'" as it's shown above ?
I have no control over the database and how this data is stored. I am just trying to retrieve it and organize it. SQL Server I believe 2008.
I realize that this wouldn't be difficult to reorganize afterwards, but I was just curious if I could write a query that just did that out of the box so I wouldn't have to worry about it.
This should do what you need:
WITH CTE AS (
SELECT YourTable.*, 0 Depth
FROM YourTable
WHERE Previous IS NULL
UNION ALL
SELECT YourTable.*, Depth + 1
FROM YourTable JOIN CTE
ON YourTable.Value = CTE.Next
)
SELECT * FROM CTE
ORDER BY Depth;
[SQL Fiddle] (Referential integrity and indexes omitted for brevity.)
We use a recursive common table expression (CTE) to travel from the head of the list (WHERE Previous IS NULL) to the trailing nodes (ON YourTable.Value = CTE.Next) and at the same time memorize the depth of the recursion that was needed to reach the current node (in Depth).
In the end, we simply sort by the depth of recursion that was needed to reach each of the nodes (ORDER BY Depth).
Use a recursive query, with the one i list here you can have multiple paths along your linked list:
with cte (Value, Previous, Next, Level)
as
(
select Value, Previous, Next, 0 as Level
from data
where Previous is null
union all
select d.Value, d.Previous, d.Next, Level + 1
from data d
inner join cte c on d.Previous = c.Value
)
select * from cte
fiddle here
If you are using Oracle, try Starts with- connect by
select ... start with initial-condition connect by
nocycle recursive-condition;
EDIT: For SQL-Server, use WITH syntax as below:
WITH rec(value, previous, next) AS
(SELECT value, previous, next
FROM table1
WHERE previous is null
UNION ALL
SELECT nextRec.value, nextRec.previous, nextRec.next
FROM table1 as nextRec, rec
WHERE rec.next = nextRec.value)
SELECT value, previous, next FROM rec;
One way to do this is with a join:
select t.*
from t left outer join
t tnext
on t.next = tnext.val
order by tnext.value
However, won't this do?
select t.*
from t
order by t.next
Something like this should work:
With Parent As (
Select
Value,
Previous,
Next
From
table
Where
Previous Is Null
Union All
Select
t.Value,
t.Previous,
t.Next
From
table t
Inner Join
Parent
On Parent.Next = t.Value
)
Select
*
From
Parent
Example

Duplicate results returned from query when distinct is used

On a current project at I am needing to do some pagination of results returned from SQL. I have hit a corner case in which the query can accept identifiers as part of the where clause, normally this isn't an issue but in one case we have a single identifier being passed up that has a one to many relationship with one of the tables that the query joins on and it is returning multiple rows in the results. That issue was fixed by introducing a distinct to the query. The following is the query which returns the correct result of one row (all table/field names have been changed of course):
select distinct [item_table].[item_id]
, row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
and [item_table].[live_item] = 1
and [item_table].[item_id] in (1404309)
However when I introduce pagination into the query I am finding that it is now returning multiple rows when it should be only be returning one. The method I am using for pagination is as follows:
select [item_id]
from (
select distinct [item_table].[item_id]
, row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
and [item_table].[live_item] = 1
and [item_table].[item_id] in (1404309)
) as [items]
where [items].[row_num] between 0 and 100
I worry that adding a distinct to the outer query will cause an incorrect number of results to be returned and I am unsure of how else to fix this issue. The database I am querying is MS SQL Server 2008.
About 5 minutes after posting the question a possible solution hit me, if I group by the item_id (and any sort criteria) which should only be one instance of it should solve the issue. After testing this was the query that I was left with:
select [item_id]
from (
select [item_table].[item_id]
, row_number() over (order by [item_table].[pub_date] desc, [item_table].[item_id]) as [row_num]
from [item_table]
join [OneToOneRelationShip] on [OneToOneRelationShip].[other_id] = [item_table].[other_id]
left join [OneToNoneOrManyRelationship] on [OneToNoneOrManyRelationship].[item_id] = [item_table].[item_id]
where [item_table].[pub_item_web] = 1
and [item_table].[live_item] = 1
and [item_table].[item_id] in (1404309)
group by [item_table].[item_id], [item_table].[pub_date]
) as [items]
where [items].[row_num] between 0 and 100
I don't see where the DISTINCT is adding any value in your first query. The results are [item_table].[item_id] and [row_num]. Because the value of [row_num] is already distinct, the combination of [item_table].[item_id] and [row_num] will be distinct. When adding the DISTINCT keyword to the query, no rows are excluded.
In the second query, your results will return [item_id] from the sub query where [row_num] meets the criteria. If there where duplicate [item_id] values in the sub-query, there will be duplicates in the final results, but now you don't display [row_num] to distinguish the duplicates.