Transform a correlated subquery into a join

Transform a correlated subquery into a join - sql

I want to express this:
SELECT
a.*
,b.timestamp_col
FROM weird_data_source a
LEFT JOIN weird_data_source b
ON a.id = b.id
AND b.timestamp_col = (
SELECT
MAX(sub.timestamp_col)
FROM weird_data_source sub
WHERE sub.id = a.id
AND sub.date_col <= a.date_col
AND sub.timestamp_col < a.timestamp_col
)
A couple notes here about the data:
date_col and timestamp_col aren't representing the same thing.
I'm not kidding... the data is really structured like this.
But the subquery is invalid. Netezza cannot handle the < operator in the correlated subquery. For the life of me I cannot figure out an alternative. How could I get around this?
My gut is telling me this could potentially be done with a join, but I haven't been able to be successful at this yet.
There's a dozen or so similar questions, but none of them seem to get at handling this type of inequality.

This should get you pretty close. You will get duplicate rows if there are two rows with the exact same timestamp_col that otherwise meet the criteria, but otherwise you should be good:
SELECT
a.id,
a.some_other_columns, -- Because we NEVER use SELECT *
b.timestamp_col
FROM
weird_data_source a
LEFT JOIN weird_data_source b ON
a.id = b.id
LEFT OUTER JOIN weird_data_source c ON
c.id = a.id AND
c.date_col <= a.date_col AND
c.timestamp_col < a.timestamp_col
LEFT OUTER JOIN weird_data_source d ON
d.id = a.id AND
d.date_col <= a.date_col AND
d.timestamp_col < a.timestamp_col AND
d.timestamp_col > c.timestamp_col
WHERE
d.id IS NULL
The query is basically looking for a matching row where no other matching row is found with a greater timestamp_col value - hence the d.id IS NULL. That column will only be NULL if no match is found.

Related

SQL JOIN Condition moved to with where clause produces differences

Query 1
select count(1)
from sdb_snmp_sysdata s
left join sdb_snmp_entphysicaltable e on s.source = e.source **and e.class = 3**
left join SDB_DF_DEVICE_DNS dns on dns.source = s.source
left join sdb_fdb_node f on upper(f.oldnodeid) = upper(dns.dns_name)
where (regexp_like(s.descr, 'NFXS-F FANT-F ALCATEL-LUCENT|Motorola APEX3000')
or regexp_like(e.descr, 'Motorola BSR64000 HD 100A Redundant Chassis|AS2511-RJ chassis')
or trim(e.ModelName) in ('RFGW1', 'ARCT01949', 'ARCT03253', 'UBR10012', 'WS-C3750-48TS-S', 'WS-C3750V2-48TS-S')
or e.name like '%Nexus5596 Chassis%')
Query 2:
select count(1)
from sdb_snmp_sysdata s
left join sdb_snmp_entphysicaltable e on s.source = e.source
left join SDB_DF_DEVICE_DNS dns on dns.source = s.source
left join sdb_fdb_node f on upper(f.oldnodeid) = upper(dns.dns_name)
where (regexp_like(s.descr, 'NFXS-F FANT-F ALCATEL-LUCENT|Motorola APEX3000')
or regexp_like(e.descr, 'Motorola BSR64000 HD 100A Redundant Chassis|AS2511-RJ chassis')
or trim(e.ModelName) in ('RFGW1', 'ARCT01949', 'ARCT03253', 'UBR10012', 'WS-C3750-48TS-S', 'WS-C3750V2-48TS-S')
or e.name like '%Nexus5596 Chassis%') **and e.class = 3**
The above two queries return different number of rows by changing e.class condition from on clause to where clause. I am unable to figure out. any help is appreciated.
My Understanding:
query 1 left outer join between sysdata and entphysicaltable hash join happens after full scan of individual tables.
in the second query 2 join happens after entphysicaltable is reduced to records containing only entphysicaltable.class = 3.
to me the query makes same sense but returns different results.
I can relate to this question I would like to know a concrete reason.

The best explanation is on a little example. Let have two tables
TABLE A
C1
----------
1
2
TABLE B
C1 C2
---------- -
1 x
Then the query with the filter B.c2 = 'x' in the ON clause returns 2 rows
select *
from A left outer join B
on A.c1 = B.c1 and B.c2 = 'x';
C1 C1 C2
---------- ---------- --
1 1 x
2
while when the filter is moved in the WHERE clause, only one row is delivered
select *
from A left outer join B
on A.c1 = B.c1
WHERE B.c2 = 'x';
C1 C1 C2
---------- ---------- --
1 1 x
The WHERE clause simple overrules the OUTER JOIN row missing logik - wee all know that NULL is not equal 'x', so the second row is discarded.
BWT if you see in the old join syntax constructs like B.c2(+) = 'x' this is the very same thema.

If I read your question right, then it simply comes down to how a LEFT JOIN works.
The way a (outer) LEFT JOIN works is that it will join what's on your left side with what's on your right side.
And then it being an outer join it will try to add NULL values to the right, for the situation where there is no match on the right.
However, by you adding your constraints in the WHERE clause, you're telling the query engine to filter out the rows where there is NULL because they will not match your WHERE clause.
If you have the filters in your ON clause - the query engine will not remove/filter out the NULL rows.
This happens because the WHERE is 'executed' after the JOINs.
That's why you get different number of rows, because an OUTER join functions differently based on whether you use the ON or the WHERE clause.
So if you want the join to include NULL rows, you'll need to use the ON clause.

Oracle left outer join stops returning results when 2-table condition added

I have an oracle sql query with a whole string of inner and left joins. However, when I add the last left join condition, it stops returning results. This may end up being a simple answer like "you can't outer join on columns from 2 tables" but I can't find any such rule for oracle, and plenty of examples showing the opposite. The sql query is:
FROM a, b,
c, d,
e, f,
g, h,
(SELECT id5 FROM
some_table WHERE
conditions) i,
(SELECT id7, type FROM
some_other_table WHERE
conditions) j
WHERE b.time in (range) AND
b.count <> 0 AND
b.id1 = e.id1 AND
e.type = g.type AND
g.type2 = f.type2 AND
b.id2 = a.id2(+) AND
b.time = a.time(+) AND
b.id3 = c.id3(+) AND
b.time = c.time(+) AND
c.id4 = d.id4(+) AND
c.time = d.time(+) AND
c.id5 = i.id5(+) AND
c.time = h.time(+) AND
c.id6 = h.id6(+) AND
h.id7 = j.id7(+); --AND
--e.type = j.type(+);
When I uncomment the final condition, no results are returned. Since this is supposed to be an outer join, that shouldn't happen. So, something in here must be making it not act like an outer join?
Is there a typo or error in here somewhere? Is there an oracle rule I am breaking? Anything that could be solved by switching to ANSI join format?
Thanks

Either you've missed something in translating your query to the simple form or I've missed something in my manipulation, but it looks like the query may not be doing what you think it should. Rewriting in standard ANSI form is more revealing:
FROM a
right outer join b
on b.id2 = a.id2
AND b.time = a.time
right outer join c
on c.id3 = b.id3
AND c.time = b.time
left outer join d
on d.id4 = c.id4
AND d.time = c.time
join e
on e.id1 = b.id1
cross join f
join g
on g.type = e.type
AND g.type2 = f.type2
left outer join h
on h.time = c.time
AND h.id6 = c.id6
left outer join(
SELECT id5 FROM
some_table WHERE
conditions) i
on i.id5 = c.id5
left outer join(
SELECT id7, type FROM
some_other_table WHERE
conditions) j
on j.id7 = h.id7
and j.type = e.type --> the criteria in question
where b.time in (range)
AND b.count <> 0;
Does this look right to you? You don't mention the RIGHT OUTER joins but I'm hoping you just forgot. You do mention the INNER joins, but table f has no join criteria at all so I've used a CROSS join, hoping here also that this is your intention.
Is the join criteria for table e as it should be? According to the pattern you have set, I would expect to see "id5" here instead of "id1". Of course, you have changed all the names to submit a simplified example, so this may be meaningless. So the first thing I would suggest is that you rewrite your original code to the ANSI form like I did, using the real table and column names. You may see something.
You are correct in that adding the marked criteria should have no effect on the number of rows in the result set. That being the case, there is something else going on.
To find out what, comment out the entire last join. If you see something screwy, keep commenting out tables to get to where the problem occurs. If everything looks good, execute just the nested query that forms "table" j. I can't think of anything it might contain that could cause the situation as you explain it, but examine it anyway.
Finally, if all else fails, form queries with just tables e and j and then with just tables h and j (with their corresponding join criteria). See what happens.
Then get back here and explain to us how the problem was somewhere else the whole time. :)

The problem (as suggested in the comments on the question) was that h.id7 = j.id7(+) AND e.type = j.type(+) is functionally equivalent to a.id1 = b.id1(+) and c.id2 = b.id2(+). The oracle join syntax is not precise enough to see that, due to other join conditions, this is not a full outer join, so this is not allowed.
This should have caused an error, but something else in the query suppressed the error, not clear to me how that happened. In any case, switching to ANSI format allowed me to more precisely specify the join conditions, and the query worked as expected.

What is the equivalent of (+)<2 in ANSI SQL?

I want to turn this
TableA.ColumnA(+)<2
into ANSI SQL.
I already tried:
(TableA.ColumnA<2 OR TableA.ColumnA IS NULL)
It missed one row. Despite the fact that its ColumnA is (null).
Edit (more context):
Here is the query
SELECT * FROM a, c
WHERE a.status(+)<2
AND a.rank(+)=1
AND c.id=a.id(+)

give this a try
SELECT * FROM c LEFT JOIN a
ON c.id = a.id
AND a.status < 2
AND a.rank = 1

including a condition dynamically based on another condition

I have a query as below
select --------
from table a
left outer join ....c
where
(a.column='123') and (c.column='456')
I would like to
include "(c.column='456')" only when (a.column='123') is not null
how do I do that in a single query ? or do I need to write two separate queries ?

Should be pretty straightforward :
select --------
from table
left outer join....
where (Condition A IS NULL) OR (condition A AND condition B)
UPDATED: For your conditions:
where (a.column is null) or (a.column='123' and c.column='456')
It will include a a row if it's a.column is null or if bot a.column and c.column have valid values.

As I understand your requirement this is the sql you want
select distinct cm.credit_amt,e.credit_lifetime,e.credit_desc,e.icom_code,e.entry_hint,
e.credit_id,e.credit_type_id,e.recontract_period,a.class_desc,a.offer_id,
a.offer_class_id
from rti_retention.package a
left outer join rti_retention.offer_class_credit b on (a.offer_id=b.offer_id
and a.offer_class_id=b.offer_class_id
and a.customer_type_id=b.customer_type_id)
left outer join rti_retention.credit_pre_bundle c on (b.credit_id=c.credit_id)
left outer join rti_retention.credit e on (c.credit_id=e.credit_id)
left outer join rti_retention.credit_mix_amount cm on (cm.credit_id=c.credit_id and cm.prod_mix_id=a.prod_mix_id)
where a.offer_class_id not in (1,2,16)
and a.channel_id=5 and a.customer_type_id=1
and a.offer_id='6055'
and c.prod_mix_id = case when (select count(*)
from rti_retention.credit_pre_bundle c1
where c1.prod_mix_id='1000' ) > 1 then '1000' else c.prod_mix_id end
and e.icom_code is not null
some time there will be some sql syntax errors. due to i havent full data base i wrote sql on mind. cant test it.

SQL query join conditions

I have a query (exert from a stored procedure) that looks something like this:
SELECT S.name
INTO #TempA
from tbl_Student S
INNER JOIN tbl_StudentHSHistory TSHSH on TSHSH.STUD_PK=S.STUD_PK
INNER JOIN tbl_CODETAILS C
on C.CODE_DETL_PK=S.GID
WHERE TSHSH.Begin_date < #BegDate
Here is the issue, the 2nd inner join and corresponding where statement should only happen if only a certain variable (#UseArchive) is true, I don't want it to happen if it is false. Also, in TSHSH certain rows might have no corresponding entries in S. I tried splitting it into 2 separate queries based on #UseArchive but studio refuses to compile that because of the INTO #TempA statement saying that there is already an object named #TempA in the database. Can anyone tell me of a way to fix the query or a way to split the queries with the INTO #TempA statement?

Looks like you're asking 2 questions here.
1- How to fix the SELECT INTO issue:
SELECT INTO only works if the target table does not exist. You need to use INSERT INTO...SELECT if the table already exists.
2- Conditional JOIN:
You'll need to do a LEFT JOIN if the corresponding row may not exist. Try this.
SELECT S.name
FROM tbl_Student S
INNER JOIN tbl_StudentHSHistory TSHSH
ON TSHSH.STUD_PK=S.STUD_PK
LEFT JOIN tbl_CODETAILS C
ON C.CODE_DETL_PK=S.GID
WHERE TSHSH.Begin_date < #BegDate
AND CASE WHEN #UseArchive = 1 THEN c.CODE_DETL_PK ELSE 0 END =
CASE WHEN #UseArchive = 1 THEN S.GID ELSE 0 END
Putting the CASE statement in the WHERE clause and not the JOIN clause will force it to act like an INNER JOIN when #UseArchive and a LEFT JOIN when not.

I'd replace it with LEFT JOIN
LEFT JOIN tbl_CODETAILS C ON #UseArchive = 1 AND C.CODE_DETL_PK=S.GID

You can split the queries and then insert into a temp table easily.
SELECT * INTO #TempA FROM
(
SELECT * FROM Q1
UNION ALL
SELECT * FROM Q2
) T

SELECT S.name
INTO #TempA
from tbl_Student S
INNER JOIN tbl_StudentHSHistory TSHSH
on TSHSH.STUD_PK = S.STUD_PK
INNER JOIN tbl_CODETAILS C
on C.CODE_DETL_PK = S.GID
and #UseArchive = true
WHERE TSHSH.Begin_date < #BegDate
But putting #UseArchive = true in the join in this case is the same as where
Your question does not make much sense to me
So what if TSHSH certain rows might have no corresponding entries in S?
If you want just one of the joins to match
SELECT S.name
INTO #TempA
from tbl_Student S
LEFT OUTER JOIN tbl_StudentHSHistory TSHSH
on TSHSH.STUD_PK = S.STUD_PK
LEFT OUTER JJOIN tbl_CODETAILS C
on C.CODE_DETL_PK = S.GID
and #UseArchive = true
WHERE TSHSH.Begin_date < #BegDate
and ( TSHSH.STUD_PK is not null or C.CODE_DETL_PK id not null )

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas