Oracle SQL outer join query puzzle - sql

select
whatever
from
bank_accs b1,
bank_accs b2,
table3 t3
where
t3.bank_acc_id = t1.bank_acc_id and
b2.bank_acc_number = b1.bank_acc_number and
b2.currency_code(+) = t3.buy_currency and
trunc(sysdate) between nvl(b2.start_date, trunc(sysdate)) and nvl(b2.end_date, trunc(sysdate));
My problem is with the date (actuality) check on b2. Now, I need to return a row for each t3xb1 (t3 = ~10 tables joined, of course), even if there are ONLY INVALID records (date-wise) in b2. How do I outer-join this bit properly?
Can't use ANSI joins, must do in a single flat query.
Thanks.

If I understand you, just add the outer sign(+) to all columns of b2:
select
whatever
from
bank_accs b1,
bank_accs b2,
table3 t3
where
t3.bank_acc_id = t1.bank_acc_id and
b2.bank_acc_number = b1.bank_acc_number and
b2.currency_code(+) = t3.buy_currency and
trunc(sysdate) between nvl(b2.start_date(+), trunc(sysdate)) and nvl(b2.end_date(+), trunc(sysdate));

It's possible to write old-style outer join with inequalities but it is error-prone. I suggest you use an inline view and the outer join will be clear and explicit:
SELECT whatever
FROM bank_accs b1,
table3 t3,
(SELECT b2.*
FROM bank_accs b2
WHERE trunc(sysdate) BETWEEN nvl(b2.start_date, trunc(sysdate))
AND nvl(b2.end_date, trunc(sysdate))
) b2
WHERE t3.bank_acc_id = t1.bank_acc_id
AND b2.bank_acc_number = b1.bank_acc_number
AND b2.currency_code(+) = t3.buy_currency;

Related

How to make a join on a table in an sql query that contains several others?

I am a beginner in SQL, I have an sql query that returns information I need on a table, the table ARTICLE_MODE:
GA_CODEARTICLE
C1
C2
C3
GA_LIBELLE
C5
C6
GA_LIBREART3
GA_LIBREART5
BUTSS5-RC
SURF HARD WARE
-
Wetsuits
DAY COVER
2021
UNISEXE
SURF
SOF
I need to retrieve information on a column of a second table.
The column MZS_DPAETAST of the table MTMPTVGEN.
In these two tables, two columns contain some identical information:
The GA_CODEARTICLE column from the ARTICLE_MODE table.
The column MZS_ARTICLE of the table MTMPTVGEN.
GA_CODEARTICLE
MZS_ARTICLE
BUTSS5-RC
BUTSS5-RC
BUTS85-RC
BUTS85-RC
BUTS75-RC
VMA045-VC
I tried to do this query to retrieve the values of the column MZS_DPAETAST which have as common values GA_CODEARTICLE and MZS_ARTICLE, it returns me many results:
select MZS_DPAETAST from MTMPTVGEN LEFT OUTER JOIN ARTICLE_MODE on MZS_ARTICLE=GA_CODEARTICLE
But how can I insert it in my initial query? Thanks for your help.
SELECT GA_CODEARTICLE, CC1.CC_LIBELLE AS C1,
YX2.YX_LIBELLE AS C2,
YX3.YX_LIBELLE AS C3,
GA_LIBELLE,
CC4.CC_LIBELLE AS C5,
CC5.CC_LIBELLE AS C6,
CC6.CC_LIBELLE AS C15,
GA_LIBREART3,
GA_LIBREART5
FROM ARTICLE_MODE
LEFT OUTER JOIN PGI_LOOKUP(GCFAMILLENIV1) CC1 ON GA_FAMILLENIV1=CC1.CC_CODE
AND CC1.CC_TYPE="FN1"
LEFT OUTER JOIN PGI_LOOKUP(GCLIBREART1) YX2 ON GA_LIBREART1=YX2.YX_CODE
AND YX2.YX_TYPE="LA1"
LEFT OUTER JOIN PGI_LOOKUP(GCLIBREART2) YX3 ON GA_LIBREART2=YX3.YX_CODE
AND YX3.YX_TYPE="LA2"
LEFT OUTER JOIN PGI_LOOKUP(GCCOLLECTION) CC4 ON GA_COLLECTION=CC4.CC_CODE
AND CC4.CC_TYPE="GCO"
LEFT OUTER JOIN PGI_LOOKUP(GCFAMILLENIV2) CC5 ON GA_FAMILLENIV2=CC5.CC_CODE
AND CC5.CC_TYPE="FN2"
LEFT OUTER JOIN PGI_LOOKUP(GCFAMILLENIV5) CC6 ON GA2_FAMILLENIV5=CC6.CC_CODE
AND CC6.CC_TYPE="FN5"
WHERE (GA_EMBALLAGE<>"X"
AND (GA_TYPEARTICLE NOT IN ("PRE","FI","FRA","UL","PAC"))
AND ((GA_STATUTART="GEN")))
ORDER BY GA_DATEMODIF DESC
It is hard to uderstand what you are having problems with. You are showing you know how to join article_mode and mtmptvgen. You have a query containing article_mode. So what keeps you from joining mtmptvgen there?
Your query has many outer joins. I cannot know whether these are really necessary. I don't now for mtmptvgen either. I am showing an outer join, but you can make this an inner join if that suffices of course.
SELECT
am.ga_codearticle,
cc1.cc_libelle as c1,
yx2.yx_libelle as c2,
yx3.yx_libelle as c3,
am.ga_libelle,
cc4.cc_libelle as c5,
cc5.cc_libelle as c6,
cc6.cc_libelle as c15,
am.ga_libreart3,
am.ga_libreart5,
m.mzs_dpaetast
FROM article_mode am
LEFT OUTER JOIN pgi_lookup(gcfamilleniv1) cc1 ON am.ga_familleniv1 = cc1.cc_code AND cc1.cc_type = 'FN1'
LEFT OUTER JOIN pgi_lookup(gclibreart1) yx2 ON am.ga_libreart1 = yx2.yx_code AND yx2.yx_type = 'LA1'
LEFT OUTER JOIN pgi_lookup(gclibreart2) yx3 ON am.ga_libreart2 = yx3.yx_code AND yx3.yx_type = 'LA2'
LEFT OUTER JOIN pgi_lookup(gccollection) cc4 ON am.ga_collection = cc4.cc_code AND cc4.cc_type = 'GCO'
LEFT OUTER JOIN pgi_lookup(gcfamilleniv2) cc5 ON am.ga_familleniv2 = cc5.cc_code AND cc5.cc_type = 'FN2'
LEFT OUTER JOIN pgi_lookup(gcfamilleniv5) cc6 ON am.ga2_familleniv5 = cc6.cc_code AND cc6.cc_type = 'FN5'
LEFT OUTER JOIN mtmptvgen m ON m.mzs_article = am.ga_codearticle
WHERE am.ga_emballage <> 'X'
AND am.ga_typearticle NOT IN ('PRE', 'FI', 'FRA', 'UL', 'PAC')
AND am.ga_statutart = 'GEN'
ORDER BY am.ga_datemodif DESC;
(I've added the missing column qualifiers and replaced the inappropriate double quotes with single quotes. These are supposed to be string literals, not column names, right? I also think that it makes queries hard to read when everything is in upper case and there are no spaces between operators, column names, and literals.)
UPDATE
You say that there is just one mzs_dpaetast value per article_mode, but the data model is inappropriate and stores it redundantly in every related mtmptvgen row. In order to deal with this, you'll want to select one mtmptvgen row per article_mode row only. One way to do so is to join a query instead of the table:
LEFT OUTER JOIN
(
SELECT mzs_article, MAX(mzs_dpaetast) AS mzs_dpaetast
FROM mtmptvgen
GROUP BY mzs_article
) m ON m.mzs_article = am.ga_codearticle

SQL Selecting Max Value 4 Joins

I´m stuck with this Query. I want to get the following information from my Database:
User.username, Satz.Gewicht, Satz.Wiederholungen, ubungen.Name and Training.Datum
The Clue is, that I want to get this Columns for every different Exercise (ubungen.id).
I Tried with this:
SELECT
a3.Datum,
a4.username,
a2.Name,
max(Gewicht) as Gewicht,
Wiederholungen
from satz a1
INNER JOIN ubungen a2 ON a1.UBID = a2.ID
INNER JOIN training a3 ON a1.TID = a3.ID
INNER JOIN user a4 ON a3.UID = a4.ID
GROUP BY a1.UBID
But Somehow I´m getting the right Weight -> max(Gewicht) , but the wrong user.
What am I doing wrong?
Here is a Screenshot of my database design:
EDIT:
When using every column in my Group by I get Multiple Columns like this:
But i Just want the one in the middle at it is the one with the highest Gewicht.
Example of what I want:
This is the whole resultset after Joining the information I want. Now I just want the Lines that are marked in red. There are some lines having the same "max(gewicht)", no matter which of the ones to pick.
You have to use all the columns in the GROUP BY except aggregated function column
Try this out:
SELECT
a3.Datum, a4.username, a2.Name, max(Gewicht) as Gewicht, Wiederholungen
FROM satz a1
INNER JOIN ubungen a2
ON a1.UBID = a2.ID
INNER JOIN training a3
ON a1.TID = a3.ID
INNER JOIN user a4
ON a3.UID = a4.ID
GROUP BY
a3.Datum, a4.username, a2.Name, Wiederholungen
I think you are just looking for the record with the maximum value for each ubid:
select x.Ubid,x.Datum,x.username,x.name,x.Gewicht,x.Wiederholungen
from
(SELECT
a1.Ubid,
a3.Datum,
a4.username,
a2.Name,
Gewicht,
Wiederholungen,
row_number() over (partition by a1.ubid order by Gewicht desc) as rownum1
from satz a1
INNER JOIN ubungen a2 ON a1.UBID = a2.ID
INNER JOIN training a3 ON a1.TID = a3.ID
INNER JOIN user a4 ON a3.UID = a4.ID ) x
where x.rownum1=1
I have made use of the row_number function to pull the record with the highest Gewicht value. Hope this helps.
Edit: Adding a db-fiddle, it's working fine in this example:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=0ac9caf422a06feb6e7c05c1a68995f5
Since row_number is not working, you can use an inner join to get the max value:
select x.Ubid,x.Datum,x.name,x.Gewicht,max(x.Wiederholungen) as Wiederholungen,max(x.username) as username
from
(SELECT
a1.Ubid,
a3.Datum,
a4.username,
a2.Name,
a1.Gewicht,
a1.Wiederholungen
from satz a1
INNER JOIN ubungen a2 ON a1.UBID = a2.ID
INNER JOIN training a3 ON a1.TID = a3.ID
INNER JOIN user a4 ON a3.UID = a4.ID ) x
INNER JOIN
(select ubid,max(Gewicht) as max_Gewicht from satz group by ubid) y
ON x.ubid=y.ubid and x.Gewicht=y.max_Gewicht
Group by x.Ubid,x.Datum,x.name,x.Gewicht
Your query runs, but it's malformed.
Columns not specified in the GROUP BY clause should be aggregated. MariaDB (and MySQL) are notorious by allowing non-aggregated columns in the SELECT clause.
Solution: Just add all non-aggregated columns to the GROUP BY clause to solve your issue, as in:
GROUP BY a1.UBID, a3.Datum, a4.username, a2.Name, Wiederholungen
As it is, MariaDB is selecting values randomly. Today you see the wrong values, but maybe tomorrow they are good. You don't want a query behaving like this.

JOIn by case Expression

I would like to perform this Code
select * from a
right join s
on case when s.[Diff ] = 0 and a.ActivityDate < s.[ExecDate]
then a.ID1 =s.ID2
when
( a.ActivityDate <s.[ExecDate] and a.ActivityDate >= s.[Date3] )
then a.ID1 =s.ID2
END
The case is pointless. You join the same two fields ANYWAYS, so just add your case conditions to the join condition:
SELECT ...
JOIN ... ON ((a.ID1 = s.ID2) AND ((case #1) OR (case #2)))
Just to elaborate on Marc's answer, I think the simplest form is:
select *
from a right join
s
on a.ID1 = s.ID2 and a.ActivityDate < s.[ExecDate] and
(s.[Diff ] = 0 or a.ActivityDate >= s.[Date3])
Note that I do advise using left join instead of right join. It is usually more intuitive to read a query thinking "all the rows in the first table are kept as well a matching rows in other tables."

Efficiently updating a table from multiple sources

I'm working on improving part of an existing ETL layer in Oracle.
A file is loaded in to a temporary table.
Many MERGE statement are executed to resolve surrogate keys.
Some other business logic is applied (which require those surrogate keys).
The results are MERGEd
in to a table (with both the surrogate keys and the business logic
results)
It's step 2 that I want to improve, it seems less than ideal to do this as several steps.
MERGE INTO temp t
USING dimension_1 d1 ON (d1.natural_key = t.d1_natural_key)
WHEN MATCHED THEN UPDATE t.d1_id = d1.id
MERGE INTO temp t
USING dimension_2 d2 ON (d2.natural_key = t.d2_natural_key)
WHEN MATCHED THEN UPDATE t.d2_id = d2.id
MERGE INTO temp t
USING dimension_3 d3 ON (d3.natural_key = t.d3_natural_key)
WHEN MATCHED THEN UPDATE t.d3_id = d3.id
If I was writing this in SQL Server I'd do something like the following:
UPDATE
t
SET
d1_id = COALESCE(d1.id, -1),
d2_id = COALESCE(d2.id, -1),
d3_id = COALESCE(d3.id, -1)
FROM
temp t
LEFT JOIN
dimension_1 d1
ON d1.natural_key = t.d1_natural_key
LEFT JOIN
dimension_2 d2
ON d2.natural_key = t.d2_natural_key
LEFT JOIN
dimension_3 d3
ON d3.natural_key = t.d3_natural_key
For the life of me I can't find what seems like a sensible option in Oracle. The best I have been able to work out is to use UPDATE (while everyone around me is screaming that I 'must' use MERGE) and correlated sub-queries; something like...
UPDATE
temp t
SET
d1_id = COALESCE((SELECT id FROM dimension_1 d1 WHERE d1.natural_key = t.d1_natural_key), -1),
d2_id = COALESCE((SELECT id FROM dimension_2 d2 WHERE d2.natural_key = t.d2_natural_key), -1),
d3_id = COALESCE((SELECT id FROM dimension_3 d3 WHERE d3.natural_key = t.d3_natural_key), -1)
Are there any better alternatives? Or is the correlated sub-query approach actually performant in Oracle?
I think the equivalent of your SQL Server update would be:
UPDATE
temp t1
SET
(d1_id, d2_id, d3_id) = (
SELECT
COALESCE(d1.id, -1),
COALESCE(d2.id, -1),
COALESCE(d3.id, -1)
FROM
temp t2
LEFT JOIN
dimension_1 d1
ON d1.natural_key = t2.d1_natural_key
LEFT JOIN
dimension_2 d2
ON d2.natural_key = t2.d2_natural_key
LEFT JOIN
dimension_3 d3
ON d3.natural_key = t2.d3_natural_key
WHERE
t2.id = t1.id
)
It's still a correlated update; the joining takes place in the subquery, since Oracle doesn't let you join as part of the update itself. Normally you wouldn't need (or want) to refer to the target outer table again in the subquery, but you need something to outer-join against here.
You can also combine the left-join approach with a merge, putting essentially the same subquery into the using clause:
MERGE INTO temp t
USING (
SELECT t.id,
COALESCE(d1.id, -1) AS d1_id,
COALESCE(d2.id, -1) AS d2_id,
COALESCE(d3.id, -1) AS d3_id
FROM
temp t
LEFT JOIN
dimension_1 d1
ON d1.natural_key = t.d1_natural_key
LEFT JOIN
dimension_2 d2
ON d2.natural_key = t.d2_natural_key
LEFT JOIN
dimension_3 d3
ON d3.natural_key = t.d3_natural_key
) d
ON (d.id = t.id)
WHEN MATCHED THEN UPDATE SET
t.d1_id = d.d1_id,
t.d2_id = d.d2_id,
t.d3_id = d.d3_id
I don't see any real benefit of using merge over update in this case though.
Both will overwrite any existing values in your three ID columns, but it sounds like you are not expecting there to be any.
I believe this may be more efficient than Alex's answer -- requiring only one access of the temp table, instead of two. On my quick test of a million rows, performance was about the same, but the plan is better since there is no second access of the temp table. It may be worth trying on your data set.
UPDATE
( SELECT d1.id s_d1_id,
d2.id s_d2_id,
d3.id s_d3_id,
mt.d1_id,
mt.d2_id,
mt.d3_id
FROM temp mt
LEFT JOIN dimension_1 d1 ON d1.natural_key = mt.d1_natural_key
LEFT JOIN dimension_2 d2 ON d2.natural_key = mt.d2_natural_key
LEFT JOIN dimension_3 d3 ON d3.natural_key = mt.d3_natural_key )
SET d1_id = COALESCE (s_d1_id, -1), d2_id = COALESCE (s_d2_id, -1), d3_id = COALESCE (s_d3_id, -1);
The caveat is, you need UNIQUE constraints on the natural_key columns in each dimension table. With these constraints, Oracle knows that temp is key-preserved in the view you are updating, which is what makes the above syntax OK.
One other caveat: I once encountered a situation where the rows from the SELECT view were not in the same order as the table. The result was that performance tanked, as the update had to revisit each block several times. An ORDER BY temp.rowid in the SELECT view would fix that.

Help for SQL tuning - ORACLE

I have a query which take data from 5 huge table, could you please help me with performance tuning of this query :
SELECT DECODE(SIGN((t1.amount - NVL(t2.amount, 0)) - 4.999), 1, NVL(t2.amount, 0), t1.amount) AS amount_1,
t1.element_id,
t1.start_date ,
t1.amount,
NVL(t5.abrev, NULL) AS criteria,
t1.case_id ,
NVL(t5.value, NULL) segment,
add_months(t1.start_date, -1) invoice_date,
NVL((SELECT SUM(b.amount)
FROM TABLE1 a, TABLE3 b
WHERE a.element_id = b.element_id
AND b.date_invoicing < a.start_date
AND t1.element_id = a.element_id),
0) amount_2
FROM TABLE1 t1, TABLE2 t2, TABLE3 t3, TABLE4 t4, TABLE5 t5
WHERE t1.TYPE = 'INVOICE'
AND t2.case_id = t3.case_id
AND t2.invoicing_id = t3.invoicing_id
AND t2.date_unpaid IS NULL
AND t1.element_id = t3.element_id(+)
AND add_months(t1.start_date, -1) <
NVL(t4.DT_FIN_DT(+), SYSDATE)
AND add_months(t1.start_date, -1) >= t4.date_creation(+)
AND t1.case_id = t4.case_id(+)
AND t4.segment = t5.abrev(+)
AND t5.Type(+) = 'CRITERIA_TYPE';
is there something wrong and could be replaced with something else?
Thanks for your help
The first thing you must do is to use Explicit Joins. This will separate your joins from your filters and will help you tune this better.
Please check if these joins are correct.
SELECT
DECODE(SIGN((t1.amount - NVL(t2.amount, 0)) - 4.999), 1, NVL(t2.amount, 0), t1.amount) AS amount_1,
t1.element_id,
t1.start_date ,
t1.amount,
NVL(t5.abrev, NULL) AS criteria,
t1.case_id ,
NVL(t5.value, NULL) segment,
add_months(t1.start_date, -1) invoice_date,
NVL
(
(SELECT SUM(b.amount)
FROM TABLE1 a, TABLE3 b
WHERE a.element_id = b.element_id
AND b.date_invoicing < a.start_date
AND t1.element_id = a.element_id),
0) amount_2
FROM
TABLE1 t1
LEFT OUTER JOIN TABLE3 t3
on t1.element_id = t3.element_id
INNER JOIN TABLE2 t2,
on t2.invoicing_id = t3.invoicing_id
and t2.case_id = t3.case_id
LEFT OUTER JOIN TABLE4 t4
on t1.case_id = t4.case_id
LEFT OUTER JOIN TABLE5 t5
on t4.segment = t5.abrev
WHERE t1.TYPE = 'INVOICE'
AND t2.date_unpaid IS NULL
AND add_months(t1.start_date, -1) < NVL(t4.DT_FIN_DT(+), SYSDATE)
AND add_months(t1.start_date, -1) >= t4.date_creation(+)
AND t5.Type(+) = 'CRITERIA_TYPE';
If they are, then you can do several things, but the best thing is to look at the execution plan.
As others have noted, it's hard to tell without looking at the execution plan.
But... some things I'd be concerned with:
The outer join to TABLE3 in the main query isn't complete as #TonyAndrews mentioned in his comment above. See the "Incomplete Join Trail" example on Common errors seen when using OUTER-JOIN. This means your query is probably producing the wrong results, but without knowing the full intent of the query and the schema, no one but you could know this for sure.
Updating your query to use the ANSI-style INNER/[LEFT|RIGHT] OUTER syntax from the Oracle-style TableName.ColumnName(+) will help make this more apparent.
The scalar subquery will get run for every row and may be slow (assuming TABLE3 is large). It will be extremely slow if there's not a useful index on TABLE3.element_id and TABLE3.date_invoicing:
NVL((SELECT SUM(b.amount)
FROM TABLE1 a, TABLE3 b
WHERE a.element_id = b.element_id
AND b.date_invoicing < a.start_date
AND t1.element_id = a.element_id),
0) amount_2
As such, I'm not seeing a need to include TABLE1 again in this subquery. It may be better to refactor this into:
NVL((SELECT SUM(b.amount)
FROM TABLE3 b
WHERE t1.element_id = b.element_id
AND b.date_invoicing < t1.start_date,
0) amount_2
Or, you may even be better off refactoring this to use an analytical function (SO question, Oracle documentation) if the criteria for summing the b.amount values is the same as that for including them in the query in the first place:
SUM(b.amount) OVER (PARTITION BY b.element_id) amount_2
Obviously, you currently have different criteria for summing b.amount since you're joining to TABLE3 differently in the main query and the subquery, but I'd imagine that's more a factor of the "Incomplete Join Trail" than by purposeful design (a guess on my part, as I can't tell the intent of the query from the code itself).
The optimizer may have produced a suboptimal execution plan. Or it may very well be running as fast as possible given the amount of work the database actually need to do.
Without explain plan, knowing the keys, relations and indexes it is a bit hard to tell what is going on.
Scalar subqueries in the select list is usually not a good idea when the outer query returns a large nr of rows.
The following expressions may prevent the optimizer from using the statistics because of the function calls. Indexes would probably not be used either for the same reason.
AND add_months(t1.start_date, -1) < NVL(t4.DT_FIN_DT(+), SYSDATE)
AND add_months(t1.start_date, -1) >= t4.date_creation(+)
Can't really be more specific than that :)
You need to learn about how to view and understand execution plans. This previous question is a good place to start.
It's abit weird here when you nest Select statement inside another
NVL((SELECT SUM(b.amount)
FROM TABLE1 a, TABLE3 b
WHERE a.element_id = b.element_id
AND b.date_invoicing < a.start_date
AND t1.element_id = a.element_id),
0) amount_2
you need to write again as a table and join after "From".