Optimization DB2 query with mass join - sql

I have complex query:
select rma.RELATION_MANAGER_ID,
rm.ORG_STRUCTURE_ID,
rm.RELATIONSHIP_MANAGER_NM,
count(distinct ppa.PARTY_ID) as count_party
from RELATIONSHIP_MANAGER rm --15808 row
join RELATIONSHIP_MANAGER_MARKET rmm --1560 row
on rm.RELATIONSHIP_MANAGER_ID = rmm.RELATIONSHIP_MANAGER_ID
and rmm.INCLUDE_IN_REPORT = 'Y'
join MARKET_SEGMENT rm_ms --4 row
on rmm.MARKET_SEGMENT_ID = rm_ms.MARKET_SEGMENT_ID
and rm_ms.MARKET_SEGMENT = '01'
join RELATIONSHIP_MANAGER_ALLOCATION rma --61349 row
on rm.RELATIONSHIP_MANAGER_ID = rma.RELATIONSHIP_MANAGER_ID
join CMD_PARTY_PORTFOLIO_ALLOCATION ppa --3114096 row
on ppa.PORTFOLIO_ID = rma.PORTFOLIO_ID
join person ps --3112575 row
on ps.IS_DELETED != 1 and ppa.party_id = ps.party_id
join PARTY p --3114146 row
on ppa.party_id=p.party_id
join MARKET_SEGMENT ms --4 row
on p.MARKET_SEGMENT_ID = ms.MARKET_SEGMENT_ID and ms.MARKET_SEGMENT = '01'
where rm.IS_CM = 1 and rm.IS_DELETED != 1
group by rm.RELATIONSHIP_MANAGER_NM, rma.RELATIONSHIP_MANAGER_ID, rm.ORG_STRUCTURE_ID
Table columns have indexes:
rm.RELATIONSHIP_MANAGER_ID,
rmm.RELATIONSHIP_MANAGER_ID,
rmm.MARKET_SEGMENT_ID,
rm_ms.MARKET_SEGMENT_ID,
rma.RELATIONSHIP_MANAGER_ID,
ppa.PORTFOLIO_ID,
rma.PORTFOLIO_ID,
ppa.party_id,
ps.party_id,
p.party_id,
p.MARKET_SEGMENT_ID,
ms.MARKET_SEGMENT_ID
tables PARTY, PERSON have ~1-3 million row,
runtime of query ~20second. I am comment
join MARKET_SEGMENT ms
on p.MARKET_SEGMENT_ID = ms.MARKET_SEGMENT_ID --and ms.MARKET_SEGMENT = '01'
runtime of query became ~3 second.
Explain why this is happening, please ?
Explain plan dont help me.. How i can optimization the query?
EDIT:
platform is DB2 for z/OS V9.7,
added size of table
EDIT2: explain plan shows that the first is always join the small size of the table

Just for grins, see if this makes any difference:
WITH
MktSeg( MARKET_SEGMENT_ID ) AS
( SELECT MARKET_SEGMENT_ID
FROM MARKET_SEGMENT
WHERE MARKET_SEGMENT = '01' )
select rma.RELATION_MANAGER_ID,
rm.ORG_STRUCTURE_ID,
rm.RELATIONSHIP_MANAGER_NM,
count(distinct ppa.PARTY_ID) as count_party
from RELATIONSHIP_MANAGER rm --15808 row
join RELATIONSHIP_MANAGER_MARKET rmm --1560 row
on rm.RELATIONSHIP_MANAGER_ID = rmm.RELATIONSHIP_MANAGER_ID
and rmm.INCLUDE_IN_REPORT = 'Y'
join MktSeg rm_ms --4 row
on rmm.MARKET_SEGMENT_ID = rm_ms.MARKET_SEGMENT_ID
join RELATIONSHIP_MANAGER_ALLOCATION rma --61349 row
on rm.RELATIONSHIP_MANAGER_ID = rma.RELATIONSHIP_MANAGER_ID
join CMD_PARTY_PORTFOLIO_ALLOCATION ppa --3114096 row
on ppa.PORTFOLIO_ID = rma.PORTFOLIO_ID
join person ps --3112575 row
on ps.IS_DELETED != 1 and ppa.party_id = ps.party_id
join PARTY p --3114146 row
on ppa.party_id=p.party_id
join MktSeg ms --4 row
on p.MARKET_SEGMENT_ID = ms.MARKET_SEGMENT_ID
WHERE rm.IS_CM = 1 AND rm.IS_DELETED != 1
group by rm.RELATIONSHIP_MANAGER_NM, rma.RELATIONSHIP_MANAGER_ID,
rm.ORG_STRUCTURE_ID, rma.RELATION_MANAGER_ID
Note also that I've added an item to the Group By clause.

Related

Athena query with joins

I am trying to run query on Athena which is not behaving as expected:
select distinct aw.year, aw.month, aw.day
from aw
left join f on aw.c1= f.c1
and aw.c2 = f.c2
and aw.c3 = f.c3
and aw.year = '2022'
and aw.month = '2'
and aw.day = '5'
I am expecting this query to return 2022, 2, 5 but it is returning several other values of year, month and day. The below query works fine and returns only 2022, 2, 5.
select distinct aw.year, aw.month, aw.day
from aw
join f on aw.c1= f.c1
and aw.c2 = f.c2
and aw.c3 = f.c3
and aw.year = '2022'
and aw.month = '2'
and aw.day = '5'
The problem is when I add left join but I am also adding the filer of required year, month and day. I am doing something wrong logically here?
You appear to be confusing conditions within an ON with conditions in a WHERE.
Your left join query returned rows where aw had other values because you were providing conditions for how to join the rows between tables. It does not limit the rows from the 'left' table.
If you only want to return rows where the LEFT table (aw) date matches 2022-02-05, then you should put those conditions in a WHERE:
select distinct aw.year, aw.month, aw.day
from aw
left join f on aw.c1 = f.c1
and aw.c2 = f.c2
and aw.c3 = f.c3
where aw.year = '2022'
and aw.month = '2'
and aw.day = '5'

T-SQL Full Outer Join (sample provided)

I have some issues getting a full outer join to work in T-SQL. The left outer join part seems to be working okay, but the right outer join doesn't work as expected. Here's some sample data to test this on:
I have a table A with the following columns and data. The row marked with red is the row that cannot be found in table B.
And a second table B with the following columns and data. The rows marked with yellow are the rows that cannot be found in table A.
I am trying to join the tables using the following sql code:
select tableA.klientnr, tableA.uttakstype, tableA.uttaksnr, tableA.vareanr TableAItem,tableB.vareanr tableBitem, tableA.kvantum tableAquantity, tableB.totkvant tableBquantity
from tableA as tableA
full outer join tableB as tableB on tableA.klientnr=tableB.klientnr and tableA.uttakstype=tableB.uttakstype and tableA.uttaksnr=tableB.uttaksnr and tableA.vareanr=tableB.vareanr and tableB.IsDeleted=0
where tableA.UttaksNr=639779 and tableA.IsDeleted=0
The result of the sql is the following image. The row marked in red is the extra row from tableA that does show up, but I can't get the rows from table B to show up
Expected to have 2 extra rows
550 SA 639779 NULL 100059 NULL 0
550 SA 639779 NULL 103040 NULL 14
Later edit:
Would this be correct way to handle the full outer join where there's the header/line type of structure? Or can the query be optimized?
SELECT ISNULL(q1.accountid, q2.accountid) AccountId
,ISNULL(q1.klientnr, q2.klientnr) KlientNr
,ISNULL(q1.tilgangstype, q2.tilgangstype) 'Reception Type'
,ISNULL(q1.tilgangsnr, q2.tilgangsnr) 'Reception No'
,ISNULL(q1.dato, q2.dato) dato
,ISNULL(q1.LevNr, q2.LevNr) LevNr
,ISNULL(q1.Pakkemerke, q2.Pakkemerke) Pakkemerke
,ISNULL(q1.VareANr, q2.VareANr) VareANr
,ISNULL(q1.Ankomstdato,q2.Ankomstdato) 'Arrival Date'
,q1.Antall1
,q1.totkvant1
,q1.Antall2
,q1.totkvant2
,q2.Antall
,q2.totkvant
,q2.AntallTilFrys
,q2.TotKvantTilFrys
,ISNULL(q1.EksternKommentar1,q2.EksternKommentar1) EksternKommentar1
,q2.[Last Upsert]
FROM (
SELECT w700.accountid
,w700.klientnr
,w700.tilgangstype
,w700.tilgangsnr
,w700.dato
,w700.Ankomstdato
,w700.LevNr
,w700.pakkemerke
,w789.VareANr
,sum(IIF(w789.prognosetype = 1, w789.Antall, NULL)) AS Antall1
,sum(IIF(w789.prognosetype = 1, w789.totkvant, NULL)) AS totkvant1
,sum(IIF(w789.prognosetype = 2, w789.Antall, NULL)) AS Antall2
,sum(IIF(w789.prognosetype = 2, w789.totkvant, NULL)) AS totkvant2
,w700.EksternKommentar1
FROM trading.W789Prognosekjopstat AS w789
INNER JOIN trading.W700Tilgangshode AS w700 ON w700.AccountId = w789.AccountId
AND w700.KlientNr = w789.Klientnr
AND w700.Tilgangsnr = w789.Tilgangsnr
AND w700.Tilgangstype = w789.Tilgangstype
AND w700.IsDeleted = 0
WHERE w789.IsDeleted = 0
GROUP BY w700.accountid
,w700.klientnr
,w700.tilgangstype
,w700.tilgangsnr
,w700.dato
,w700.Ankomstdato
,w700.LevNr
,w700.pakkemerke
,w789.VareANr
,w700.EksternKommentar1
) q1
FULL OUTER JOIN (
SELECT w700.accountid
,w700.klientnr
,w700.tilgangstype
,w700.tilgangsnr
,w700.dato
,w700.Ankomstdato
,w700.LevNr
,w700.pakkemerke
,w702.VareANr
,w702.Antall
,w702.TotKvant
,w702.ValPris
,w702.AntallTilFrys
,w702.TotKvantTilFrys
,w700.EksternKommentar1
,(SELECT MAX(LastUpdateDate) FROM (VALUES (w702.createdAt),(w702.updatedAt)) AS UpdateDate(LastUpdateDate)) AS 'Last Upsert'
FROM trading.w702PrognoseKjop w702
INNER JOIN trading.W700Tilgangshode AS w700 ON w700.AccountId = w702.AccountId
AND w700.KlientNr = w702.Klientnr
AND w700.Tilgangsnr = w702.Tilgangsnr
AND w700.Tilgangstype = w702.Tilgangstype
AND w700.IsDeleted = 0
WHERE w702.IsDeleted = 0
) q2 ON q1.accountid = q2.accountid
AND q1.klientnr = q2.klientnr
AND q1.tilgangstype = q2.tilgangstype
AND q1.tilgangsnr = q2.tilgangsnr
AND q1.vareanr = q2.vareanr
WHERE totkvant1 IS NOT NULL
OR totkvant2 IS NOT NULL
OR totkvant IS NOT NULL
Filtering with full join is really tricky. Your where criteria are actually turning the full join into a left join.
You can do what you want by filtering before the join:
select a.klientnr, a.uttakstype, a.uttaksnr, a.vareanr a.tableAitem,
b.vareanr b.tableBitem, a.kvantum a.tableAquantity, b.totkvant b.tableBquantity
from (select a.*
from tableA a
where a.UttaksNr = 639779 and a.IsDeleted = 0
) a full join
(select b.*
from tableB b
where b.IsDeleted = 0
) b
on a.klientnr = b.klientnr and
a.uttakstype= b.uttakstype and
a.uttaksnr = b.uttaksnr and
a.vareanr = b.vareanr;

SQL statement merge two rows into one

In the results of my sql-statement (SQL Server 2016) I would like to combine two rows with the same value in two columns ("study_id" and "study_start") into one row and keep the row with higest value in a third cell ("Id"). If any columns (i.e. "App_id" or "Date_arrival) in the row with higest Id is NULL, then it should take the value from the row with the lowest "Id".
I get the result below:
Id study_id study_start Code Expl Desc Startmonth App_id Date_arrival Efter_op Date_begin
167262 878899 954 4.1 udd.ord Afbrudt feb 86666 21-06-2012 N 17-08-2012
180537 878899 954 1 Afsluttet Afsluttet feb NULL NULL NULL NULL
And I would like to get this result:
Id study_id study_start Code Expl Desc Startmonth App_id Date_arrival Efter_op Date_begin
180537 878899 954 1 Afsluttet Afsluttet feb 86666 21-06-2012 N 17-08-2012
My statement looks like this:
SELECT dbo.PopulationStam_V.ELEV_ID AS id,
dbo.PopulationStam_V.PERS_ID AS study_id,
dbo.STUDIESTARTER.STUDST_ID AS study_start,
dbo.Optagelse_Studiestatus.AFGANGSARSAG AS Code,
dbo.Optagelse_Studiestatus.KORT_BETEGNELSE AS Expl,
ISNULL((CAST(dbo.Optagelse_Studiestatus.Studiestatus AS varchar(20))), 'Indskrevet') AS 'Desc',
dbo.STUDIESTARTER.OPTAG_START_MANED AS Startmonth,
dbo.ANSOGNINGER.ANSOG_ID as App_id,
dbo.ANSOGNINGER.ANKOMSTDATO AS Data_arrival',
dbo.ANSOGNINGER.EFTEROPTAG AS Efter_op,
dbo.ANSOGNINGER.STATUSDATO AS Date_begin
FROM dbo.INSTITUTIONER
INNER JOIN dbo.PopulationStam_V
ON dbo.INSTITUTIONER.INST_ID = dbo.PopulationStam_V.SEMI_ID
LEFT JOIN dbo.ANSOGNINGER
ON dbo.PopulationStam_V.ELEV_ID = dbo.ANSOGNINGER.ELEV_ID
INNER JOIN dbo.STUDIESTARTER
ON dbo.PopulationStam_V.STUDST_ID_OPRINDELIG = dbo.STUDIESTARTER.STUDST_ID
INNER JOIN dbo.UDD_NAVNE_T
ON dbo.PopulationStam_V.UDDA_ID = dbo.UDD_NAVNE_T.UDD_ID
INNER JOIN dbo.UDDANNELSER
ON dbo.UDD_NAVNE_T.UDD_ID = dbo.UDDANNELSER.UDDA_ID
LEFT OUTER JOIN dbo.PERSONER
ON dbo.PopulationStam_V.PERS_ID = dbo.PERSONER.PERS_ID
LEFT OUTER JOIN dbo.POSTNR
ON dbo.PERSONER.PONR_ID = dbo.POSTNR.PONR_ID
LEFT OUTER JOIN dbo.KønAlleElevID_V
ON dbo.PopulationStam_V.ELEV_ID = dbo.KønAlleElevID_V.ELEV_ID
LEFT OUTER JOIN dbo.Optagelse_Studiestatus
ON dbo.PopulationStam_V.AFAR_ID = dbo.Optagelse_Studiestatus.AFAR_ID
LEFT OUTER JOIN dbo.frafaldsmodel_adgangsgrundlag
ON dbo.frafaldsmodel_adgangsgrundlag.ELEV_ID = dbo.PopulationStam_V.ELEV_ID
LEFT OUTER JOIN dbo.Optagelse_prioriteterUFM
ON dbo.Optagelse_prioriteterUFM.cpr = dbo.PopulationStam_V.CPR_NR
AND dbo.Optagelse_prioriteterUFM.Aar = dbo.frafaldsmodel_adgangsgrundlag.optagelsesaar
LEFT OUTER JOIN dbo.frafaldsmodel_stoettetabel_uddannelser AS fsu
ON fsu.id_uddannelse = dbo.UDDANNELSER.UDDA_ID
AND fsu.id_inst = dbo.INSTITUTIONER.INST_ID
AND fsu.uddannelse_aar = dbo.frafaldsmodel_adgangsgrundlag.optagelsesaar
WHERE dbo.STUDIESTARTER.STUDIESTARTSDATO > '2012-03-01 00:00:00.000'
AND (dbo.Optagelse_Studiestatus.AFGANGSARSAG IS NULL
OR dbo.Optagelse_Studiestatus.AFGANGSARSAG NOT LIKE '2.7.4')
AND (dbo.PopulationStam_V.INDSKRIVNINGSFORM = '1100'
OR dbo.PopulationStam_V.INDSKRIVNINGSFORM = '1700')
GROUP BY dbo.PopulationStam_V.ELEV_ID,
dbo.PopulationStam_V.PERS_ID,
dbo.STUDIESTARTER.STUDST_ID,
dbo.Optagelse_Studiestatus.AFGANGSARSAG,
dbo.Optagelse_Studiestatus.KORT_BETEGNELSE,
dbo.STUDIESTARTER.OPTAG_START_MANED,
Studiestatus,
dbo.ANSOGNINGER.ANSOG_ID,
dbo.ANSOGNINGER.ANKOMSTDATO,
dbo.ANSOGNINGER.EFTEROPTAG,
dbo.ANSOGNINGER.STATUSDATO
I really hope somebody out there can help.
Many ways, this will work:
WITH subSource AS (
/* Your query here */
)
SELECT
s1.id,
/* all other columns work like this:
COALESCE(S1.column,s2.column)
for example: */
coalesce(s1.appid,s2.appid) as appid
FROM subSource s1
INNER JOIN subSource s2
ON s1.study_id =s2.study_id
and s1.study_start = s2.study_start
AND s1.id > s2.id
/* I imagine some other clauses might be needed but maybe not */
The rest is copy paste

Query returning duplicate rows

I wrote this query
SELECT DISTINCT
F2_FILIAL, F2_SERIE, F2_DOC,
C6_NUM, AB7_NUMOS, A1_NOME, F2_EMISSAO, F2_VALBRUT, F2_VEND1,
A3_NOME,F2_COND , E4_DESCRI, C5_NATUREZ, ED_DESCRIC, AAG_DESCRI
FROM
SF2010 SF
LEFT JOIN
SE4010 SE ON F2_COND = E4_CODIGO
LEFT JOIN
SA3010 A3 ON F2_VEND1 = A3_COD
LEFT JOIN
SA1010 A1 ON F2_CLIENTE = A1_COD
LEFT JOIN
SD2010 SD ON F2_DOC = D2_DOC
LEFT JOIN
SC6010 C6 ON D2_PEDIDO = C6_NUM
LEFT JOIN
SC5010 C5 ON D2_PEDIDO = C5_NUM
LEFT JOIN
SED010 ED ON C5_NATUREZ = ED_CODIGO
LEFT JOIN
AB7010 AB ON SUBSTRING(C6_NUMOS,1,6) = AB7_NUMOS
LEFT JOIN
AAG010 AG ON AB7_CODPRB = AAG_CODPRB
WHERE
(F2_CLIENTE >= ' '
AND F2_CLIENTE <= 'zzzzzz')
AND (F2_EMISSAO >= '20170222'
AND F2_EMISSAO <= '20170222')
AND (F2_VEND1 >= ''
AND F2_VEND1 <= 'zz')
AND (C5_NATUREZ >= ''
AND C5_NATUREZ <= 'zzzzzzzzzz')
AND (F2_COND >= ''
AND F2_COND <= 'zzz')
AND (F2_FILIAL >= ''
AND F2_FILIAL <= 'zz')
AND (SF.D_E_L_E_T_ <> '*')
AND F2_DUPL <> ''
AND F2_VALFAT <> 0
ORDER BY
F2_VEND1, F2_EMISSAO
And it results in something like this:
Notice that the 2 last rows are the same (the main field here is F2_DOC, it should never appear twice), but since the field C6_NUM and AB7_NUMOS has more than one reference it displays both of them, duplicating the row.
How can I improve my query to not duplicate a row when the table I'm joining has more than 1 distinct FK to the table I'm querying?
If you are doing multiple left joins and the leftest table in the join doesn't have ( or not mentioned in select ) a unique value, it's possible you get duplicate rows. if you need to have a unique column, use the primary key in the leftest table inside the select statement or add a column like
row_number() as id
to your query.

Confused in join query in SQL

The following works:
SELECT IBAD.TRM_CODE, IBAD.IPABD_CUR_QTY, BM.BOQ_ITEM_NO,
IBAD.BCI_CODE, BCI.BOQ_CODE
FROM IPA_BOQ_ABSTRCT_DTL IBAD,
BOQ_CONFIG_INF BCI,BOQ_MST BM
WHERE BM.BOQ_CODE = BCI.BOQ_CODE
AND BCI.BCI_CODE = IBAD.BCI_CODE
AND BCI.STATUS = 'Y'
AND BM.STATUS = 'Y'
order by boq_item_no;
Results:
But after joining many tables with that query, the result is confusing:
SELECT (SELECT CMN_NAME
FROM CMN_MST
WHERE CMN_CODE= BRI.CMN_RLTY_MTRL) MTRL,
RRI.RRI_RLTY_RATE AS RATE,
I.BOQ_ITEM_NO,
(TRIM(TO_CHAR(IBAD.IPABD_CUR_QTY,
'9999999999999999999999999999990.999'))) AS IPABD_CUR_QTY,
TRIM(TO_CHAR(BRI.BRI_WT_FACTOR,
'9999999999999999999999999999990.999')) AS WT,
TRIM(TO_CHAR((IBAD.IPABD_CUR_QTY*BRI.BRI_WT_FACTOR),
'9999999999999999999999990.999')) AS RLTY_QTY,
(TRIM(TO_CHAR((IBAD.IPABD_CUR_QTY*BRI.BRI_WT_FACTOR*RRI.RRI_RLTY_RATE),
'9999999999999999999999990.99'))) AS TOT_AMT,
I.TRM_CODE AS TRM
FROM
(SELECT * FROM ipa_boq_abstrct_dtl) IBAD
INNER JOIN
(SELECT * FROM BOQ_RLTY_INF) BRI
ON IBAD.BCI_CODE = BRI.BCI_CODE
INNER JOIN
(SELECT * FROM RLTY_RATE_INF) RRI
ON BRI.CMN_RLTY_MTRL = RRI.CMN_RLTY_MTRL
INNER JOIN
( SELECT IBAD.TRM_CODE, IBAD.IPABD_CUR_QTY,
BM.BOQ_ITEM_NO, IBAD.BCI_CODE, BCI.BOQ_CODE
FROM IPA_BOQ_ABSTRCT_DTL IBAD,
BOQ_CONFIG_INF BCI,BOQ_MST BM
WHERE
BM.BOQ_CODE = BCI.BOQ_CODE
AND BCI.BCI_CODE = IBAD.BCI_CODE
and BCI.status = 'Y'
and bm.status = 'Y') I
ON BRI.BCI_CODE = I.BCI_CODE
AND I.TRM_CODE = BRI.TRM_CODE
AND BRI.TRM_CODE =4
group by BRI.CMN_RLTY_MTRL, RRI.RRI_RLTY_RATE, I.BOQ_ITEM_NO,
IBAD.IPABD_CUR_QTY, BRI.BRI_WT_FACTOR, I.TRM_CODE, I.bci_code
order by BRI.CMN_RLTY_MTRL
Results:
TRM should be 11 instead of 4 in the first row.
you getting 4 because you use
AND BRI.TRM_CODE =4
if you remove this criter you can get true result
In your first query, both of the rows you've highlighted have BCI_CODE=1866.
In the second query, you are joining that result set with a number of others (which come from the same tables, which seems odd). In particular, you are joining from the subquery to another table using BCI_CODE, and from there to (SELECT * FROM ipa_boq_abstrct_dtl) IBAD. Since both of the rows from the subquery have the same BCI_CODE, they will join to the same rows in the other tables.
The quantity that you are actually displaying in the second query is from (SELECT * FROM ipa_boq_abstrct_dtl) IBAD, not from the other subquery.
Is the problem simply that you mean to select I.IPABD_CUR_QTY instead of IBAD.IPABD_CUR_QTY?
You might find this clearer if you did not reuse the same aliases for tables at multiple points in the query.