Athena query with joins - sql

I am trying to run query on Athena which is not behaving as expected:
select distinct aw.year, aw.month, aw.day
from aw
left join f on aw.c1= f.c1
and aw.c2 = f.c2
and aw.c3 = f.c3
and aw.year = '2022'
and aw.month = '2'
and aw.day = '5'
I am expecting this query to return 2022, 2, 5 but it is returning several other values of year, month and day. The below query works fine and returns only 2022, 2, 5.
select distinct aw.year, aw.month, aw.day
from aw
join f on aw.c1= f.c1
and aw.c2 = f.c2
and aw.c3 = f.c3
and aw.year = '2022'
and aw.month = '2'
and aw.day = '5'
The problem is when I add left join but I am also adding the filer of required year, month and day. I am doing something wrong logically here?

You appear to be confusing conditions within an ON with conditions in a WHERE.
Your left join query returned rows where aw had other values because you were providing conditions for how to join the rows between tables. It does not limit the rows from the 'left' table.
If you only want to return rows where the LEFT table (aw) date matches 2022-02-05, then you should put those conditions in a WHERE:
select distinct aw.year, aw.month, aw.day
from aw
left join f on aw.c1 = f.c1
and aw.c2 = f.c2
and aw.c3 = f.c3
where aw.year = '2022'
and aw.month = '2'
and aw.day = '5'

Related

How to force postgres to return 0 even if there are no rows matching query, using coalesce, group by and join

I've been trying hopelessly to get the following SQL statement to return the query results and default to 0 if there are no rows matching the query.
This is the intended result:
vol | year
-------+------
0 | 2018
Instead I get:
vol | year
-----+------
(0 rows)
Here is the sql statement:
select coalesce(vol,0) as vol, year
from (select sum(vol) as vol, year
from schema.fact_data
join schema.period_data
on schema.fact_data.period_tag = schema.period_data.tag
join schema.product_data
on schema.fact_data.product_tag =
schema.product_data.tag
join schema.market_data
on schema.fact_data.market_tag = schema.market_data.tag
where "retailer"='MadeUpRetailer'
and "product_tag"='FakeProductTag'
and "year"='2018' group by year
) as DerivedTable;
I know the query works because it returns data when there is data. Just doesn't default to 0 as intended...
Any help in finding why this is the case would be much appreciated!
Using your subquery DerivedTable, you could write:
SELECT coalesce(DerivedTable.vol, 0) AS vol,
y.year
FROM (VALUES ('2018'::text)) AS y(year)
LEFT JOIN (SELECT ...) AS DerivedTable
ON DerivedTable.year = y.year;
Remove the GROUP BY (and the outer query):
select 2018 as year, coalesce(sum(vol), 0) as vol
from schema.fact_data f join
schema.period_data p
on f.period_tag = p.tag join
schema.product_data pr
on f.product_tag = pr.tag join
schema.market_data m
on fd.market_tag = m.tag
where "retailer" = 'MadeUpRetailer' and
"product_tag" = 'FakeProductTag' and
"year" = '2018';
An aggregation query with no GROUP BY always returns exactly one row, so this should do what you want.
EDIT:
The query would look something like this:
select v.yyyy as year, coalesce(sum(vol), 0) as vol
from (values (2018), (2019)) v(yyyy) left join
schema.fact_data f
on f.year = v.yyyy left join -- this is just an example. I have no idea where year is coming from
schema.period_data p
on f.period_tag = p.tag left join
schema.product_data pr
on f.product_tag = pr.tag left join
schema.market_data m
on fd.market_tag = m.tag
group by v.yyyy
However, you have to move the where conditions to the appropriate on clauses. I have no idea where the columns are coming from.
From the code you posted it is not clear in which table you have the year column.
You can use UNION to fetch just 1 row in case there are no rows in that table for the year 2018 like this:
select sum(vol) as vol, year
from schema.fact_data innrt join schema.period_data
on schema.fact_data.period_tag = schema.period_data.tag
inner join schema.product_data
on schema.fact_data.product_tag = schema.product_data.tag
inner join schema.market_data
on schema.fact_data.market_tag = schema.market_data.tag
where
"retailer"='MadeUpRetailer' and
"product_tag"='FakeProductTag' and
"year"='2018'
group by "year"
union
select 0 as vol, '2018' as year
where not exists (
select 1 from tablename where "year" = '2018'
)
In case there are rows for the year 2018, then nothing will be fetched by the 2nd query,

SQL Joining tables but getting skipped values

Hi i have joined to tables the code is below. Notice I have used A.Manad = B.Manad which joins data where the month of table A and B is equal. But sometimes table B dont have any data for that month. My code just skip the data, i would rather it just leave it empty or with a value of 0.
The Code takes a list of Orgnr which is swedish for company numbers and joins two tables where the orgnr is the same and the month is the same, but for some reason it doesnt join the data when the value is empty for one company. I still want the orgnr to show up in the joint table.
select Tillnr = A.tillnr, Orgnr = A.orgnr, Månad = A.Manad, Intrastat =
A.varde,Moms = B.vardeutf
into #Tabell1
From
IntrastatFsum A
left outer join
Momsuppg B
on A.Orgnr = B.Orgnr
where A.Orgnr in(
165563137933,165020456017,.......)
AND A.Ar = 2017
AND B.Ar = A.AR
AND A.Manad = 9
AND A.Manad = B.Manad
AND A.InfUtf = 'U'
You should move your WHERE clause AND A.Manad = B.Manad and AND B.Ar = A.AR to the LEFT JOIN clause.
In this way you will preserve all data from table IntrastatFsum:
select Tillnr = A.tillnr, Orgnr = A.orgnr, Månad = A.Manad, Intrastat =
A.varde,Moms = B.vardeutf
into #Tabell1
From
IntrastatFsum A
left outer join
Momsuppg B
on A.Orgnr = B.Orgnr
AND A.Manad = B.Manad
AND A.AR = B.Ar
where A.Orgnr in(
165563137933,165020456017,.......)
AND A.Ar = 2017
AND A.Manad = 9
AND A.InfUtf = 'U'

Oracle SQL - using the coalesce function

I really can't get my head around the coalesce function ... or if this is even the best way to get the result I'm trying to achieve.
I have three dates in the following script (iv.dated, iv1.dated, dh.actshpdate). When I run the following script the dates are in separate columns (as expected);
select unique li.catnr, li.av_part_no, li.artist||' / '||li.title description, li.cust_catnr pallet_ref,
trunc(iv.dated), trunc(iv1.dated), trunc(dh.actshpdate)
from leos_item li
left join invtran_view_oes iv
on li.av_part_no = iv.part_no
and (iv.transaction = 'NREC' and iv.location_no = ' RETURNS W')
left join invtran_view_oes iv1
on li.av_part_no = iv1.part_no
and (iv1.transaction = 'CORR+' and iv1.remark like 'STOCK FROM SP PALLET%')
left join oes_delsegview od
on od.catnr = li.catnr
and od.prodtyp = li.prodtyp
and od.packtyp = li.packtyp
left join oes_dpos dp
on od.ordnr = dp.ordnr
and od.posnr = dp.posnr
and od.segnr = dp.segnr
left join oes_dhead dh
on dp.dheadnr = dh.dheadnr
where li.cunr = '816900'
and substr(li.catnr,1,5) in ('RGMCD','RGJCD')
and li.item_type = 'FP'
and li.catnr = 'RGJCD221'
What I would like to achieve is one column with all dates in date order.
I tried replacing my dates with ...
trunc(coalesce(iv.dated, iv1.dated, dh.actshpdate)) transaction_date
... but, I lose some of the dates;
How can I achieve the following result?
You could use UNION in the following way -
WITH DATA AS(
<your query goes here>
)
SELECT A, b, c, d, e FROM DATA
UNION
SELECT A,b,c,d,f FROM DATA
UNION
SELECT A,b,c,d,g FROM DATA
where a, b, c, d, e, f, g are the column alias of the select list in your original query. You can give your own column alias in the UNION query.

Confused in join query in SQL

The following works:
SELECT IBAD.TRM_CODE, IBAD.IPABD_CUR_QTY, BM.BOQ_ITEM_NO,
IBAD.BCI_CODE, BCI.BOQ_CODE
FROM IPA_BOQ_ABSTRCT_DTL IBAD,
BOQ_CONFIG_INF BCI,BOQ_MST BM
WHERE BM.BOQ_CODE = BCI.BOQ_CODE
AND BCI.BCI_CODE = IBAD.BCI_CODE
AND BCI.STATUS = 'Y'
AND BM.STATUS = 'Y'
order by boq_item_no;
Results:
But after joining many tables with that query, the result is confusing:
SELECT (SELECT CMN_NAME
FROM CMN_MST
WHERE CMN_CODE= BRI.CMN_RLTY_MTRL) MTRL,
RRI.RRI_RLTY_RATE AS RATE,
I.BOQ_ITEM_NO,
(TRIM(TO_CHAR(IBAD.IPABD_CUR_QTY,
'9999999999999999999999999999990.999'))) AS IPABD_CUR_QTY,
TRIM(TO_CHAR(BRI.BRI_WT_FACTOR,
'9999999999999999999999999999990.999')) AS WT,
TRIM(TO_CHAR((IBAD.IPABD_CUR_QTY*BRI.BRI_WT_FACTOR),
'9999999999999999999999990.999')) AS RLTY_QTY,
(TRIM(TO_CHAR((IBAD.IPABD_CUR_QTY*BRI.BRI_WT_FACTOR*RRI.RRI_RLTY_RATE),
'9999999999999999999999990.99'))) AS TOT_AMT,
I.TRM_CODE AS TRM
FROM
(SELECT * FROM ipa_boq_abstrct_dtl) IBAD
INNER JOIN
(SELECT * FROM BOQ_RLTY_INF) BRI
ON IBAD.BCI_CODE = BRI.BCI_CODE
INNER JOIN
(SELECT * FROM RLTY_RATE_INF) RRI
ON BRI.CMN_RLTY_MTRL = RRI.CMN_RLTY_MTRL
INNER JOIN
( SELECT IBAD.TRM_CODE, IBAD.IPABD_CUR_QTY,
BM.BOQ_ITEM_NO, IBAD.BCI_CODE, BCI.BOQ_CODE
FROM IPA_BOQ_ABSTRCT_DTL IBAD,
BOQ_CONFIG_INF BCI,BOQ_MST BM
WHERE
BM.BOQ_CODE = BCI.BOQ_CODE
AND BCI.BCI_CODE = IBAD.BCI_CODE
and BCI.status = 'Y'
and bm.status = 'Y') I
ON BRI.BCI_CODE = I.BCI_CODE
AND I.TRM_CODE = BRI.TRM_CODE
AND BRI.TRM_CODE =4
group by BRI.CMN_RLTY_MTRL, RRI.RRI_RLTY_RATE, I.BOQ_ITEM_NO,
IBAD.IPABD_CUR_QTY, BRI.BRI_WT_FACTOR, I.TRM_CODE, I.bci_code
order by BRI.CMN_RLTY_MTRL
Results:
TRM should be 11 instead of 4 in the first row.
you getting 4 because you use
AND BRI.TRM_CODE =4
if you remove this criter you can get true result
In your first query, both of the rows you've highlighted have BCI_CODE=1866.
In the second query, you are joining that result set with a number of others (which come from the same tables, which seems odd). In particular, you are joining from the subquery to another table using BCI_CODE, and from there to (SELECT * FROM ipa_boq_abstrct_dtl) IBAD. Since both of the rows from the subquery have the same BCI_CODE, they will join to the same rows in the other tables.
The quantity that you are actually displaying in the second query is from (SELECT * FROM ipa_boq_abstrct_dtl) IBAD, not from the other subquery.
Is the problem simply that you mean to select I.IPABD_CUR_QTY instead of IBAD.IPABD_CUR_QTY?
You might find this clearer if you did not reuse the same aliases for tables at multiple points in the query.

Optimization DB2 query with mass join

I have complex query:
select rma.RELATION_MANAGER_ID,
rm.ORG_STRUCTURE_ID,
rm.RELATIONSHIP_MANAGER_NM,
count(distinct ppa.PARTY_ID) as count_party
from RELATIONSHIP_MANAGER rm --15808 row
join RELATIONSHIP_MANAGER_MARKET rmm --1560 row
on rm.RELATIONSHIP_MANAGER_ID = rmm.RELATIONSHIP_MANAGER_ID
and rmm.INCLUDE_IN_REPORT = 'Y'
join MARKET_SEGMENT rm_ms --4 row
on rmm.MARKET_SEGMENT_ID = rm_ms.MARKET_SEGMENT_ID
and rm_ms.MARKET_SEGMENT = '01'
join RELATIONSHIP_MANAGER_ALLOCATION rma --61349 row
on rm.RELATIONSHIP_MANAGER_ID = rma.RELATIONSHIP_MANAGER_ID
join CMD_PARTY_PORTFOLIO_ALLOCATION ppa --3114096 row
on ppa.PORTFOLIO_ID = rma.PORTFOLIO_ID
join person ps --3112575 row
on ps.IS_DELETED != 1 and ppa.party_id = ps.party_id
join PARTY p --3114146 row
on ppa.party_id=p.party_id
join MARKET_SEGMENT ms --4 row
on p.MARKET_SEGMENT_ID = ms.MARKET_SEGMENT_ID and ms.MARKET_SEGMENT = '01'
where rm.IS_CM = 1 and rm.IS_DELETED != 1
group by rm.RELATIONSHIP_MANAGER_NM, rma.RELATIONSHIP_MANAGER_ID, rm.ORG_STRUCTURE_ID
Table columns have indexes:
rm.RELATIONSHIP_MANAGER_ID,
rmm.RELATIONSHIP_MANAGER_ID,
rmm.MARKET_SEGMENT_ID,
rm_ms.MARKET_SEGMENT_ID,
rma.RELATIONSHIP_MANAGER_ID,
ppa.PORTFOLIO_ID,
rma.PORTFOLIO_ID,
ppa.party_id,
ps.party_id,
p.party_id,
p.MARKET_SEGMENT_ID,
ms.MARKET_SEGMENT_ID
tables PARTY, PERSON have ~1-3 million row,
runtime of query ~20second. I am comment
join MARKET_SEGMENT ms
on p.MARKET_SEGMENT_ID = ms.MARKET_SEGMENT_ID --and ms.MARKET_SEGMENT = '01'
runtime of query became ~3 second.
Explain why this is happening, please ?
Explain plan dont help me.. How i can optimization the query?
EDIT:
platform is DB2 for z/OS V9.7,
added size of table
EDIT2: explain plan shows that the first is always join the small size of the table
Just for grins, see if this makes any difference:
WITH
MktSeg( MARKET_SEGMENT_ID ) AS
( SELECT MARKET_SEGMENT_ID
FROM MARKET_SEGMENT
WHERE MARKET_SEGMENT = '01' )
select rma.RELATION_MANAGER_ID,
rm.ORG_STRUCTURE_ID,
rm.RELATIONSHIP_MANAGER_NM,
count(distinct ppa.PARTY_ID) as count_party
from RELATIONSHIP_MANAGER rm --15808 row
join RELATIONSHIP_MANAGER_MARKET rmm --1560 row
on rm.RELATIONSHIP_MANAGER_ID = rmm.RELATIONSHIP_MANAGER_ID
and rmm.INCLUDE_IN_REPORT = 'Y'
join MktSeg rm_ms --4 row
on rmm.MARKET_SEGMENT_ID = rm_ms.MARKET_SEGMENT_ID
join RELATIONSHIP_MANAGER_ALLOCATION rma --61349 row
on rm.RELATIONSHIP_MANAGER_ID = rma.RELATIONSHIP_MANAGER_ID
join CMD_PARTY_PORTFOLIO_ALLOCATION ppa --3114096 row
on ppa.PORTFOLIO_ID = rma.PORTFOLIO_ID
join person ps --3112575 row
on ps.IS_DELETED != 1 and ppa.party_id = ps.party_id
join PARTY p --3114146 row
on ppa.party_id=p.party_id
join MktSeg ms --4 row
on p.MARKET_SEGMENT_ID = ms.MARKET_SEGMENT_ID
WHERE rm.IS_CM = 1 AND rm.IS_DELETED != 1
group by rm.RELATIONSHIP_MANAGER_NM, rma.RELATIONSHIP_MANAGER_ID,
rm.ORG_STRUCTURE_ID, rma.RELATION_MANAGER_ID
Note also that I've added an item to the Group By clause.