Updated with sqlfilld SQLfiddle
I have an Oracle query where I need to reduce the number of left outer join to perform efficiently. The current query runs for more than 2 hours and I want to reduce its complexity by reducing the number of join operations.
Without the joins, the query runs in 15 minutes. Hence I want to rewrite the logic. Is there any efficient way to do that?
WITH myquery AS
(
SELECT *
FROM TEST_FILE1
)
SELECT
A.Col3, A.Col1, A.Col2, A.Col4, A.Col5
-- D.CB,
-- NVL(D.CD, 0), NVL(D.CE, 0), NVL(D.EF, 0),
,CASE WHEN V1.Col1 IS NULL THEN 0 ELSE 1 END AS QQ1
,CASE WHEN V2.Col3 IS NULL THEN 0 ELSE 1 END AS QQ2
,CASE WHEN V3.Col1 IS NULL THEN 0 ELSE 1 END AS QQ3
,CASE WHEN V4.Col3 IS NULL THEN 0 ELSE 1 END AS QQ4
, case when V5.Col1 is NULL then 0 else 1 end as QQ5
, case when V6.Col3 is NULL then 0 else 1 end as QQ6
, case when V7.Col1 is NULL then 0 else 1 end as QQ7
, case when V8.Col3 is NULL then 0 else 1 end as QQ8
FROM (
SELECT Col3, Col1, Col2, Col4, Col5
FROM (
SELECT distinct Col3
FROM myquery
) A1
CROSS JOIN (
SELECT distinct Col1
FROM myquery
) A2
CROSS JOIN (
SELECT distinct Col2
FROM myquery
) A3
CROSS JOIN (
SELECT distinct Col4
FROM myquery
) A4
CROSS JOIN (
SELECT distinct Col5
FROM myquery
) A5
WHERE Col3 = 42
) A
LEFT JOIN myquery D on NVL(D.Col3, '-') = NVL(A.Col3, '-') AND NVL(D.Col1, '-') = NVL(A.Col1, '-')
AND NVL(D.Col2, '-') = NVL(A.Col2, '-') AND NVL(D.Col4, '-') = NVL(A.Col4, '-') AND NVL(D.Col5,
'-') = NVL(A.Col5, '-')
LEFT JOIN (
SELECT distinct Col1, Col3, Col5
FROM myquery
) V1 on V1.Col1 = A.Col1 AND V1.Col3 = A.Col3 AND V1.Col5 = A.Col5
LEFT JOIN (
SELECT distinct Col3, Col5, Col2
FROM myquery
) V2 on V2.Col3 = A.Col3 AND V2.Col5 = A.Col5 AND V2.Col2 = A.Col2
LEFT JOIN (
SELECT distinct Col3, Col5, Col1, Col2
FROM myquery
) V3 on V3.Col3 = A.Col3 AND V3.Col5 = A.Col5 AND V3.Col1 = A.Col1 AND V3.Col2 = A.Col2
LEFT JOIN (
SELECT distinct Col3, Col5, Col2
FROM myquery
WHERE Col1 in ('Bert','Myra')
) V4 on V4.Col3 = A.Col3 AND V4.Col5 = A.Col5 AND V4.Col2 = A.Col2
LEFT JOIN (
SELECT distinct Col1, Col3
FROM myquery
) V5 on V5.Col1 = A.Col1 AND V5.Col3 = A.Col3
LEFT JOIN (
SELECT distinct Col3, Col2
FROM myquery
) V6 on V6.Col3 = A.Col3 AND V6.Col2 = A.Col2
LEFT JOIN (
SELECT distinct Col3, Col1, Col2
FROM myquery
) V7 on V7.Col3 = A.Col3 AND V7.Col1 = A.Col1 AND V7.Col2 = A.Col2
LEFT JOIN (
SELECT distinct Col3, Col2
FROM myquery
WHERE Col1 in ('Bert','Myra')
) V8 on V8.Col3 = A.Col3 AND V8.Col2 = A.Col2
So far I was thinking of using analytical window function but didn't get the desired output .Any leads will be highly appreciated.
Here is my input data of test_file table
+------+------+------+------+------+
| COL1 | COL2 | COL3 | COL4 | COL5 |
+------+------+------+------+------+
| Bert | "M" | 42 | 68 | 166 |
| Carl | "M" | 32 | 70 | 155 |
| Dave | "M" | 39 | 72 | 167 |
| Elly | "F" | 30 | 66 | 124 |
| Fran | "F" | 33 | 66 | 115 |
| Hank | "M" | 30 | 71 | 158 |
| Jake | "M" | 32 | 69 | 143 |
| Luke | "M" | 34 | 72 | 163 |
| Neil | "M" | 36 | 75 | 160 |
| Page | "F" | 31 | 67 | 135 |
| Alex | "M" | 41 | 74 | 170 |
| Gwen | "F" | 26 | 64 | 121 |
| Ivan | "M" | 53 | 72 | 175 |
| Kate | "F" | 47 | 69 | 139 |
| Myra | "F" | 23 | 62 | 98 |
| Omar | "M" | 38 | 70 | 145 |
| Quin | "M" | 29 | 71 | 176 |
| Ruth | "F" | 28 | 65 | 131 |
+------+------+------+------+------+
From this table I want to create every posssible combination by taking distinct values of each column by applying cross join . It will produce 7776 records with my filter on col1=42 . because I want all possible combination for this column only.
With this combination I want to check which all column combinatios are null using many combinations of left outer joins.
Output(partial):
+------+------+------+------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| COL3 | COL1 | COL2 | COL4 | COL5 | QQ1 | QQ2 | QQ3 | QQ4 | QQ5 | QQ6 | QQ7 | QQ8 |
+------+------+------+------+------+-----+-----+-----+-----+-----+-----+-----+-----+
| 42 | Page | "F" | 68 | 176 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 42 | Alex | "F" | 62 | 143 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 42 | Fran | "M" | 66 | 175 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 42 | Omar | "F" | 70 | 176 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 42 | Elly | "M" | 72 | 124 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 42 | Quin | "M" | 64 | 160 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 42 | Omar | "M" | 64 | 158 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 42 | Kate | "F" | 62 | 176 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 42 | Neil | "F" | 69 | 145 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 42 | Dave | "F" | 62 | 163 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 42 | Ruth | "M" | 70 | 115 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 42 | Bert | "M" | 65 | 121 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
| 42 | Bert | "M" | 72 | 145 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
| 42 | Omar | "M" | 62 | 158 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| 42 | Ruth | "M" | 75 | 131 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
+------+------+------+------+------+-----+-----+-----+-----+-----+-----+-----+-----+
When checking whether data exists in a table we use EXISTS or IN, not JOIN (SELECT DISTINCT ...). Hence this is the query I'd probably come up with:
WITH myquery AS
(
SELECT * FROM TEST_FILE1
)
, a as
(
select col1, col2, 42 as col3, col4, col5
from
(
(select distinct col1 from myquery)
cross join
(select distinct col2 from myquery)
cross join
(select distinct col4 from myquery)
cross join
(select distinct col5 from myquery)
)
)
select
a.col1, a.col2, a.col3, a.col4, a.col5,
case when (col1, col3, col5) in (select col1, col3, col5 from myquery ) then 1 else 0 end as v1,
case when (col2, col3, col5) in (select col2, col3, col5 from myquery ) then 1 else 0 end as v2,
case when (col1, col2, col3, col5) in (select col1, col2, col3, col5 from myquery ) then 1 else 0 end as v3,
case when (col2, col3, col5) in (select col2, col3, col5 from myquery where col1 in ('Bert', 'Myra')) then 1 else 0 end as v4,
case when (col1, col3) in (select col1, col3 from myquery ) then 1 else 0 end as v5,
case when (col2, col3) in (select col2, col3 from myquery ) then 1 else 0 end as v6,
case when (col1, col2, col3) in (select col1, col2, col3 from myquery ) then 1 else 0 end as v7,
case when (col2, col3) in (select col2, col3 from myquery where col1 in ('Bert', 'Myra')) then 1 else 0 end as v8
from a
order by a.col1, a.col2, a.col3, a.col4, a.col5;
If your real query here: WITH myquery AS (...) is more than just a mere SELECT * FROM TEST_FILE1, you may want to use the /*+MATERIALIZE*/ hint here in order to speed up access to it.
I have two tables:
t1
col1 col2
a a
a c
b a
b d
c a
c d
t2
team game
mazs 1
mazs 2
doos 1
bahs 3
...
t2 is a very long table with many teams and games, whereas t1 is shown in its entirety - a table with 6 rows, featuring combinations of the letters a,b,c,d. Note that t1 is not a fully exhaustive list of a,b,c,d pairings, only the pairings in the 6 rows that appear.
I'd like to create a table that looks like this:
output
team game col1 col2
mazs 1 a a
mazs 1 a c
mazs 1 b a
mazs 1 b d
mazs 1 c a
mazs 1 c d
mazs 2 a a
mazs 2 a c
mazs 2 b a
mazs 2 b d
mazs 2 c a
mazs 2 c d
What's going on here is that for each row in t2, there are 6 rows in output, one row for each of the col1, col2 pairings from t1.
t1 and t2 are created on my end from the following queries:
SELECT col1, col2 FROM sometable GROUP BY col1, col2
SELECT DISTINCT team, game FROM anothertable
The first query creates t1 and the second query creates t2.
It is called CROSS JOIN, see below example:
With t1 as (
select 'a' col1, 'a' col2 union all
select 'a' col1, 'c' col2 union all
select 'b' col1, 'a' col2 union all
select 'b' col1, 'd' col2 union all
select 'c' col1, 'a' col2 union all
select 'c' col1, 'd' col2),
t2 as (
select 'mazs' team, 1 game union all
select 'mazs' team, 2 game union all
select 'doos' team, 1 game union all
select 'bahs' team, 3 game
)
SELECT * FROM t2 cross join t1;
Output:
+------+------+------+------+
| team | game | col1 | col2 |
+------+------+------+------+
| mazs | 1 | a | a |
| mazs | 1 | a | c |
| mazs | 1 | b | a |
| mazs | 1 | b | d |
| mazs | 1 | c | a |
| mazs | 1 | c | d |
| mazs | 2 | a | a |
| mazs | 2 | a | c |
| mazs | 2 | b | a |
| mazs | 2 | b | d |
| mazs | 2 | c | a |
| mazs | 2 | c | d |
| doos | 1 | a | a |
| doos | 1 | a | c |
| doos | 1 | b | a |
| doos | 1 | b | d |
| doos | 1 | c | a |
| doos | 1 | c | d |
| bahs | 3 | a | a |
| bahs | 3 | a | c |
| bahs | 3 | b | a |
| bahs | 3 | b | d |
| bahs | 3 | c | a |
| bahs | 3 | c | d |
+------+------+------+------+
I have sql table as follows
+-----------------------------+
| |col1 | col2 | col3| col4| |
+-----------------------------+
| _______________________ |
| | a | 3 | d1 | 10 | |
| | a | 6 | d2 | 15 | |
| | b | 2 | d2 | 8 | |
| | b | 30 | d1 | 50 | |
+-----------------------------+
I would like transform the above table into below, where the transformation is
col4 = col4 - (col4 % min(col2) group by col1)
+------------------------------+
| |col1 | col2 | col3| col4| |
+------------------------------+
| ____________________________ |
| |a | 3 | d1 | 9 | |
| |a | 6 | d2 | 15 | |
| |b | 2 | d2 | 8 | |
| |b | 30 | d1 | 50 | |
| |
+------------------------------+
I could read the above table in application code to do transformation manually, was wondering if it was possible to offload the transformation to sql
Just run a simple select query for this:
select col1, col2, col3,
col4 - (col4 % min(col2) over (partition by col1))
from t;
There is no need to actually modify the table.
You can use a multi-table UPDATE to achieve your desired result, joining your table to a table of MIN(col2) values:
UPDATE table1
SET col4 = col4 - (col4 % t2.col2min)
FROM (SELECT col1, MIN(col2) AS col2min
FROM table1
GROUP BY col1) t2
WHERE table1.col1 = t2.col1
Output:
col1 col2 col3 col4
a 3 d1 9
a 6 d2 15
b 2 d2 8
b 30 d1 50
Demo on dbfiddle
I have 4 tables in an oracle database with a complex relationship and they do not have useful primary keys.
TableA
+------+------+------+------+------+-----------------+
| ColA | ColX | ColY | ColZ | ColZa| A |
+------+------+------+------+------+-----------------+
| k9 | a1 | c1 | g1 | z1 | 2018-02-19 |
| k9 | a1 | c1 | g3 | z2 | 2018-02-02 |
| k10 | a2 | f3 | g1 | z3 | 2018-02-09 |
| k10 | a | b | c | d | 2018-02-03 |
| k | a | b | c1 | z2 | 2018-02-01 |
| k9 | a1 | c1 | c9 | z5 | 2018-02-04 |
| k9 | a1 | c1 | c2 | z5 | 2018-02-03 |
| k9 | a1 | c1 | g2 | z5 | 2018-02-03 |
+------+------+------+------+------+-----------------+
TableB
+------+------+------+------+------+----------------+
| ColA | ColX | ColY | ColZ | ColZa| B |
+------+------+------+------+------+----------------+
| e | a3 | f | g1 | i | 2018-02-03 |
| e3 | a1 | f1 | g3 | d2 | 2018-02-04 |
| k9 | a1 | c1 | g2 | z5 | 2018-02-08 |
| e4 | a4 | f2 | g2 | i2 | 2018-02-07 |
| e5 | a1 | f1 | g1 | d2 | 2018-02-06 |
| k9 | a1 | c1 | g1 | d2 | 2018-02-22 |
+------+------+------+------+------+----------------+
TableC
+------+------+------+----------------+
| ColA | ColX | ColY | C |
+------+------+------+----------------+
| ab | c2 | c2 | cx |
| k9 | a1 | c1 | cy |
| cd | a2 | c3 | cy |
| ef | c2 | c4 | cz |
| ef | c2 | c2 | cz |
+------+------+------+----------------+
TableD
+------+------+------+----------------+
| ColA | ColX | ColY | D |
+------+------+------+----------------+
| e | a | f | dx |
| e1 | a | a | dy |
| e2 | a1 | a1 | dz |
+------+------+------+----------------+
Some business logic requires me to select and combine data from TableA and TableB
The Problem:
Fetch records ColA, ColX, ColY, ColZ, ColZa, A, B in TableA AND/OR TableB for cases where pseudo key ColA_ColX_ColY have value ColZ = 'g1', with merge on ColA | ColX | ColY | ColZ | ColZa.
I used the word 'pseudo' here because it is not really a key but it's just a means to identify the records of interest in TablesA and TablesB.
To construct a valid key, count(colY) must be 1 for value in colX in TableC and TableD (this is actually the case in all four tables but if you only consider distinct values but I am suppose to use only TableC and TableD since it is more explicit)
The process:
In the result table below, I should get row1 in table TableA because 'a1' has only one count(ColY)=1 in TableC but I ignored row1 in TableB and row3 in TableA because count(ColY) is not equal to 1 in either TableC or TableD
Now that I have a value 'a1' from TableC.ColX which matches my criteria, I select all records in TableA and TableB where ColX = 'a1' and ColY = 'c1' and ColA = 'k9'
My desired result
+------+------+------+------+------+-----------------+----------------+
| ColA | ColX | ColY | ColZ | ColZa| A | B |
+------+------+------+------+------+-----------------+----------------|
| k9 | a1 | c1 | g1 | z1 | 2018-02-19 | [null] |
| k9 | a1 | c1 | g3 | z2 | 2018-02-02 | [null] |
| k9 | a1 | c1 | c9 | z5 | 2018-02-04 | [null] |
| k9 | a1 | c1 | c2 | z5 | 2018-02-03 | [null] |
| k9 | a1 | c1 | g2 | z5 | 2018-02-03 | 2018-02-08 |
| k9 | a1 | c1 | g4 | d2 | [null] | 2018-02-22 |
+------+------+------+------+------+-----------------+----------------+
So, I wrote a query similar to
select a.ColX, a.ColY, a.ColZ, a.ColZa, a.A, b.B from TableA a FULL OUTER JOIN TableB b ON a.ColX=b.ColX AND a.ColY=b.ColY AND a.ColZ=b.ColZ
where (
a.ColX IN
(select ColX from TableA where
ColX IN
(select ColX from TableC group by ColX HAVING count(ColY)=1) and
ColX in
(select distinct ColX from TableB where ColZ = 'g1'and B > trunc(sysdate) - 365)
group by ColX having count(distinct ColY)=1)
OR
b.ColX IN
(select ColX from TableA where
ColX IN
(select ColX from TableC group by ColX HAVING count(ColY)=1) and
ColX in
(select distinct ColX from TableB where ColZ = 'g1' and B > trunc(sysdate) - 365)
group by ColX having count(distinct ColY)=1));
I have no control over the data model here. How do I make my query work?
The data in TableA and TableB are in 100,000 records and data in TableC and TableD are up to a million.
SQL is not my area of expertise and I really hope I am not going too off the mark here.
I didn't understand what your query is supposed to do, but as a pure refactoring exercise I get this:
with whatever as
( select colx
from tablea
where colx in
( select colx
from tablec
group by colx having count(colb) = 1
union all
select colx
from tableb
where colz = 'g1'
and b > trunc(sysdate) - 365 )
group by colx
having count(distinct colza) = 1 )
select a.colx, a.coly, a.colz, a.colza, a.a, b.b
from tablea a
full outer join tableb b
on a.colx = b.colx
and a.coly = b.coly
and a.colz = b.colz
join whatever w
on w.colx in (a.colx, b.colx);
I need to find the sum of cases in col2 where for each set in col1 (ABC), the col2 value has a Y in col3 100% of the time. In this case, B1 &
D1 meet this criteria, so N=2. Support in pandas or SQL are helpful (both are ideal).
| col1 | col2 | col3 | col4 | col5 |
|------|-------|-------|-------|-------|
| A | A1 | N | 1 | 256 |
| A | B1 | Y | 2 | 3 |
| A | C1 | N | 3 | 323 |
| B | F1 | N | 1 | 89 |
| B | B1 | Y | 2 | 256 |
| C | D1 | Y | 1 | 3 |
| D | A1 | N | 1 | 32 |
| D | C1 | Y | 2 | 893 |
Something like this in python pandas
df.groupby('col2').col3.apply(lambda x : sum(x=='Y')==x.count()).sum()
Out[568]: 2
More detail :
df.groupby('col2').col3.apply(lambda x : sum(x=='Y')==x.count())
Out[569]:
col2
A1 False
B1 True
C1 False
D1 True
F1 False
Name: col3, dtype: bool
I don't see what col1 has to do with this. You can do this with a SQL query:
select count(*)
from (select col2
from t
where min(col3) = max(col3) and min(col3) = 'Y'
) t;