Select from two slightly similar tables with complex conditions

Select from two slightly similar tables with complex conditions - sql

I have 4 tables in an oracle database with a complex relationship and they do not have useful primary keys.
TableA
+------+------+------+------+------+-----------------+
| ColA | ColX | ColY | ColZ | ColZa| A |
+------+------+------+------+------+-----------------+
| k9 | a1 | c1 | g1 | z1 | 2018-02-19 |
| k9 | a1 | c1 | g3 | z2 | 2018-02-02 |
| k10 | a2 | f3 | g1 | z3 | 2018-02-09 |
| k10 | a | b | c | d | 2018-02-03 |
| k | a | b | c1 | z2 | 2018-02-01 |
| k9 | a1 | c1 | c9 | z5 | 2018-02-04 |
| k9 | a1 | c1 | c2 | z5 | 2018-02-03 |
| k9 | a1 | c1 | g2 | z5 | 2018-02-03 |
+------+------+------+------+------+-----------------+
TableB
+------+------+------+------+------+----------------+
| ColA | ColX | ColY | ColZ | ColZa| B |
+------+------+------+------+------+----------------+
| e | a3 | f | g1 | i | 2018-02-03 |
| e3 | a1 | f1 | g3 | d2 | 2018-02-04 |
| k9 | a1 | c1 | g2 | z5 | 2018-02-08 |
| e4 | a4 | f2 | g2 | i2 | 2018-02-07 |
| e5 | a1 | f1 | g1 | d2 | 2018-02-06 |
| k9 | a1 | c1 | g1 | d2 | 2018-02-22 |
+------+------+------+------+------+----------------+
TableC
+------+------+------+----------------+
| ColA | ColX | ColY | C |
+------+------+------+----------------+
| ab | c2 | c2 | cx |
| k9 | a1 | c1 | cy |
| cd | a2 | c3 | cy |
| ef | c2 | c4 | cz |
| ef | c2 | c2 | cz |
+------+------+------+----------------+
TableD
+------+------+------+----------------+
| ColA | ColX | ColY | D |
+------+------+------+----------------+
| e | a | f | dx |
| e1 | a | a | dy |
| e2 | a1 | a1 | dz |
+------+------+------+----------------+
Some business logic requires me to select and combine data from TableA and TableB
The Problem:
Fetch records ColA, ColX, ColY, ColZ, ColZa, A, B in TableA AND/OR TableB for cases where pseudo key ColA_ColX_ColY have value ColZ = 'g1', with merge on ColA | ColX | ColY | ColZ | ColZa.
I used the word 'pseudo' here because it is not really a key but it's just a means to identify the records of interest in TablesA and TablesB.
To construct a valid key, count(colY) must be 1 for value in colX in TableC and TableD (this is actually the case in all four tables but if you only consider distinct values but I am suppose to use only TableC and TableD since it is more explicit)
The process:
In the result table below, I should get row1 in table TableA because 'a1' has only one count(ColY)=1 in TableC but I ignored row1 in TableB and row3 in TableA because count(ColY) is not equal to 1 in either TableC or TableD
Now that I have a value 'a1' from TableC.ColX which matches my criteria, I select all records in TableA and TableB where ColX = 'a1' and ColY = 'c1' and ColA = 'k9'
My desired result
+------+------+------+------+------+-----------------+----------------+
| ColA | ColX | ColY | ColZ | ColZa| A | B |
+------+------+------+------+------+-----------------+----------------|
| k9 | a1 | c1 | g1 | z1 | 2018-02-19 | [null] |
| k9 | a1 | c1 | g3 | z2 | 2018-02-02 | [null] |
| k9 | a1 | c1 | c9 | z5 | 2018-02-04 | [null] |
| k9 | a1 | c1 | c2 | z5 | 2018-02-03 | [null] |
| k9 | a1 | c1 | g2 | z5 | 2018-02-03 | 2018-02-08 |
| k9 | a1 | c1 | g4 | d2 | [null] | 2018-02-22 |
+------+------+------+------+------+-----------------+----------------+
So, I wrote a query similar to
select a.ColX, a.ColY, a.ColZ, a.ColZa, a.A, b.B from TableA a FULL OUTER JOIN TableB b ON a.ColX=b.ColX AND a.ColY=b.ColY AND a.ColZ=b.ColZ
where (
a.ColX IN
(select ColX from TableA where
ColX IN
(select ColX from TableC group by ColX HAVING count(ColY)=1) and
ColX in
(select distinct ColX from TableB where ColZ = 'g1'and B > trunc(sysdate) - 365)
group by ColX having count(distinct ColY)=1)
OR
b.ColX IN
(select ColX from TableA where
ColX IN
(select ColX from TableC group by ColX HAVING count(ColY)=1) and
ColX in
(select distinct ColX from TableB where ColZ = 'g1' and B > trunc(sysdate) - 365)
group by ColX having count(distinct ColY)=1));
I have no control over the data model here. How do I make my query work?
The data in TableA and TableB are in 100,000 records and data in TableC and TableD are up to a million.
SQL is not my area of expertise and I really hope I am not going too off the mark here.

I didn't understand what your query is supposed to do, but as a pure refactoring exercise I get this:
with whatever as
( select colx
from tablea
where colx in
( select colx
from tablec
group by colx having count(colb) = 1
union all
select colx
from tableb
where colz = 'g1'
and b > trunc(sysdate) - 365 )
group by colx
having count(distinct colza) = 1 )
select a.colx, a.coly, a.colz, a.colza, a.a, b.b
from tablea a
full outer join tableb b
on a.colx = b.colx
and a.coly = b.coly
and a.colz = b.colz
join whatever w
on w.colx in (a.colx, b.colx);

Related

Count multiple Columns in Oracle SQL

I am looking to count the occurrence of IDs in 3 different columns using SQL. The raw table would look like:
id | A | B | C
------------------
1 | A1 | B1 | C1
2 | A1 | B2 | C1
3 | A1 | B1 | C2
4 | A2 | B1 | C2
5 | A1 | B2 | C2
The desired Table should look like:
id | A | count(A) | B | count(B) | C | count(C)
--------------------------------------------------
1 | A1 | 4 | B1 | 3 | C1 | 2
2 | A1 | 4 | B2 | 2 | C1 | 2
3 | A1 | 4 | B1 | 3 | C2 | 3
4 | A2 | 1 | B1 | 3 | C2 | 3
5 | A1 | 4 | B2 | 2 | C2 | 3
I tried the below query for a single column but didn't quite get the desired results:
SELECT A, COUNT(A) from table_name GROUP BY A;
But am unable to do the same for 3 columns.

Use COUNT as analytic function:
SELECT
id,
A,
COUNT(*) OVER (PARTITION BY A) cnt_A,
B,
COUNT(*) OVER (PARTITION BY B) cnt_B,
C,
COUNT(*) OVER (PARTITION BY C) cnt_C
FROM yourTable
ORDER BY id;
Demo
You don't want GROUP BY here, at least not by itself, because that aggregates the original records all of which you want to include in the output.

SQL: Update with all possible combinations

I have a relation
+-----+----+
| seq | id |
+-----+----+
| 1 | A1 |
| 2 | B1 |
| 3 | C1 |
| 4 | D1 |
+-----+----+
and want to join it in PostgreSQL with
+----+-------+
| id | alter |
+----+-------+
| B1 | B2 |
| D1 | D2 |
+----+-------+
so I get all possible combinations of replacement (i.e. the Cartesian product of replacing more or less). So group 1 has no update,group 2 only B2, group 3 only D2 and group 4 both B2 and D2.
The end should look like this, but should be open to more (like an extra D3 for D1)
+-------+-----+----+
| group | seq | id |
+-------+-----+----+
| 1 | 1 | A1 |
| 1 | 2 | B1 |
| 1 | 3 | C1 |
| 1 | 4 | D1 |
| 2 | 1 | A1 |
| 2 | 2 | B2 |
| 2 | 3 | C1 |
| 2 | 4 | D1 |
| 3 | 1 | A1 |
| 3 | 2 | B1 |
| 3 | 3 | C1 |
| 3 | 4 | D2 |
| 4 | 1 | A1 |
| 4 | 2 | B2 |
| 4 | 3 | C1 |
| 4 | 4 | D2 |
+-------+-----+----+
EDIT:
Another possible replacement table could be
+----+-------+
| id | alter |
+----+-------+
| B1 | B2 |
| D1 | D2 |
| D1 | D3 |
+----+-------+
could should result in 6 groups (I hope I haven't forgot a case)
+-------+-----+----+
| group | seq | id |
+-------+-----+----+
| 1 | 1 | A1 |
| 1 | 2 | B1 |
| 1 | 3 | C1 |
| 1 | 4 | D1 |
| 2 | 1 | A1 |
| 2 | 2 | B2 |
| 2 | 3 | C1 |
| 2 | 4 | D1 |
| 3 | 1 | A1 |
| 3 | 2 | B2 |
| 3 | 3 | C1 |
| 3 | 4 | D2 |
| 4 | 1 | A1 |
| 4 | 2 | B2 |
| 4 | 3 | C1 |
| 4 | 4 | D3 |
| 5 | 1 | A1 |
| 5 | 2 | B1 |
| 5 | 3 | C1 |
| 5 | 4 | D2 |
| 6 | 1 | A1 |
| 6 | 2 | B1 |
| 6 | 3 | C1 |
| 6 | 4 | D3 |
+-------+-----+----+
If you have instead three replacements like
+----+-------+
| id | alter |
+----+-------+
| B1 | B2 |
| C1 | C2 |
| D1 | D3 |
+----+-------+
It'll result in 8 groups.
What I tried so far was not really helpful:
WITH a as (SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1') ) as a1(seq, id) )
, b as (SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter) )
---------
SELECT row_number() OVER (PARTITION BY a.id) as g, * FROM
a
CROSS JOIN b as b1
CROSS JOIN b as b2
LEFT JOIN b as b3 ON a.id=b3.id
ORDER by g,seq;

So group 1 has no update,group 2 only B2, group 3 only D2 and group 4
both B2 and D2.
Since the logic of this statement is not in a table, I decided to add this logic to table c, which is adding 3 new columns to the existing table a, depending on which selection of field had to be considered.
WITH a as (SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1') ) as a1(seq, id) )
, b as (SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter) )
, c as (
SELECT a.seq, a.id,
COALESCE(b1.alter,a.id) as id2,
COALESCE(b2.alter,a.id) as id3,
COALESCE(b3.alter,a.id) as id4
FROM a
LEFT JOIN (SELECT * FROM b WHERE b.alter='B2') b1 ON a.id = b1.id
LEFT JOIN (SELECT * FROM b WHERE b.alter='D2') b2 ON a.id = b2.id
LEFT JOIN (SELECT * FROM b WHERE b.alter IN ('B2','D2')) b3 ON a.id = b3.id)
, d as (SELECT * FROM (values (1),(2), (3), (4) ) as d1(gr) )
SELECT d.gr,
CASE d.gr
WHEN 1 THEN c.id
WHEN 2 THEN c.id2
WHEN 3 THEN c.id3
WHEN 4 THEN c.id4 END as id
FROM d
CROSS JOIN c
ORDER by d.gr, c.seq

What you need
After additional info from your comments, it seems like this is your case:
You have toll stations with a given number of booths:
CREATE TABLE station (
station text PRIMARY KEY
, booths int NOT NULL -- number of cashiers in station
);
INSERT INTO station VALUES
('A', 1)
, ('B', 2)
, ('C', 1)
, ('D', 3);
For a given route, say A --> B --> C --> D you want to generate all possible paths, taking booth numbers into account. I suggest an SQL function with a recursive CTE like:
CREATE OR REPLACE FUNCTION f_pathfinder(_route text[])
RETURNS TABLE (grp int, path text[]) LANGUAGE sql STABLE PARALLEL SAFE AS
$func$
WITH RECURSIVE rcte AS (
SELECT cardinality($1) AS hops, 1 AS hop, ARRAY[s.station || booth] AS path
FROM station s, generate_series(1, s.booths) booth
WHERE s.station = $1[1]
UNION ALL
SELECT r.hops, r.hop + 1, r.path || (s.station || booth)
FROM rcte r
JOIN station s ON s.station = _route[r.hop + 1], generate_series(1, s.booths) booth
WHERE r.hop < r.hops
)
SELECT row_number() OVER ()::int AS grp, path
FROM rcte r
WHERE r.hop = r.hops;
$func$;
Simple call:
SELECT * FROM f_pathfinder('{A,B,C,D}'::text[]);
Result:
grp | path
---: | :--------
1 | {1,1,1,1}
2 | {1,1,1,2}
3 | {1,1,1,3}
4 | {1,2,1,1}
5 | {1,2,1,2}
6 | {1,2,1,3}
Or with unnested arrays (result like you show in question):
SELECT grp, seq, booth
FROM f_pathfinder('{A,B,C,D}'::text[])
, unnest(path) WITH ORDINALITY AS x(booth, seq); -- ①
Result:
grp | seq | booth
--: | --: | :----
1 | 1 | A1
1 | 2 | B1
1 | 3 | C1
1 | 4 | D1
2 | 1 | A1
2 | 2 | B1
2 | 3 | C1
2 | 4 | D2
3 | 1 | A1
3 | 2 | B1
3 | 3 | C1
3 | 4 | D3
4 | 1 | A1
4 | 2 | B2
4 | 3 | C1
4 | 4 | D1
5 | 1 | A1
5 | 2 | B2
5 | 3 | C1
5 | 4 | D2
6 | 1 | A1
6 | 2 | B2
6 | 3 | C1
6 | 4 | D3
db<>fiddle here
The number of variants is growing quickly with the number of stops in your route. It's M1*M2* .. *Mn with Mn being the number of booths for the nth station.
① About ORDINALITY:
PostgreSQL unnest() with element number
What you asked (originally)
Seems like you want to apply all possible combinations from the set of changes listed in the replacement table rpl to the target table tbl.
With just two rows, forming the 4 (2^n) possible combinations is simple. For a general solution, I suggest a basic combinatorics function to generate all combinations. There are innumerable ways. Here is a pure SQL function:
CREATE OR REPLACE FUNCTION f_allcombos(_len int)
RETURNS SETOF bool[] LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
WITH RECURSIVE
tf(b) AS (VALUES (false), (true))
, rcte AS (
SELECT 1 AS lvl, ARRAY[b] AS arr
FROM tf
UNION ALL
SELECT r.lvl + 1, r.arr || tf.b
FROM rcte r, tf
WHERE lvl < _len
)
SELECT arr
FROM rcte
WHERE lvl = _len;
$func$;
Similar to what's discussed here:
Generate all combinations of given list of elements, sorted
Example for just 2 replacement rows:
SELECT * FROM f_allcombos(2);
{f,f}
{t,f}
{f,t}
{t,t}
Query
WITH effective_rpl AS ( -- ①
SELECT *, count(alter) OVER (ORDER BY seq) AS idx -- ②
FROM tbl LEFT JOIN rpl USING (id)
)
SELECT c.grp, e.seq
, CASE WHEN alter IS NOT NULL AND c.arr[e.idx] THEN e.alter -- ③
ELSE e.id END AS id
FROM effective_rpl e
, f_allcombos((SELECT count(alter)::int FROM effective_rpl)) -- ④
WITH ORDINALITY AS c(arr, grp); -- ⑤
Produces your desired result exactly.
db<>fiddle here
① Some of the replacements may have no match in the target table; so determine effective replacements to begin with.
② count() only counts non-null values, so this can serve as index for the 1-based array returned from f_allcombos().
③ Only replace when a replacement is available, and the boolean array has true for the given index idx.
④ The CROSS JOIN multiplies the set of rows in the target table with the number of possible replacement combinations
⑤ I use WITH ORDINALITY to generate "group numbers". See:
PostgreSQL unnest() with element number
We might wire that into the function directly, but I'd rather keep it generic.
Aside: "alter" is non-reserved in Postgres but a reserved word in standard SQL.

answer updated after edit to question
The tricky part in this problem is to generate the powerset of the replacements. However, luckily postgres supports recursive queries & powersets can be computed recursively. Thus we can build a general solution to this problem that will work regardless of the size of your replacement set.
Lets call the first table source, the 2nd table replacements, and I'll avoid the unsavory name alter for something else:
CREATE TABLE source (seq, id) as (
VALUES (1, 'A1'), (2, 'B1'), (3, 'C1'), (4, 'D1')
);
CREATE TABLE replacements (id, sub) as (
VALUES ('B1', 'B2'), ('D1', 'D2')
);
First powerset of the ids to be replaced needs to be generated. The null-set may be omitted since that won't work with joins anyhow & at the end the source table can be union'ed to the intermediate result to provide the same output.
In the recursive step, the the JOIN condition rec.id > repl.id ensures that each id is present only once for each generated subset.
In the final step:
the cross join fans out the source N times, where N is the number of non-empty combinations of replacements (with variations)
the group names are generated using a filtered runnign sum on seq.
the subsets are unnested & the ids replaced using a coalesce if the replacement id equals the source id.
WITH RECURSIVE rec AS (
SELECT ARRAY[(id, sub)] subset, id FROM replacements
UNION ALL
SELECT subset || (repl.id, sub), repl.id
FROM replacements repl
JOIN rec ON rec.id > repl.id
)
SELECT NULL subset, 0 set_name, seq, id FROM source
UNION ALL
SELECT subset
, SUM(seq) FILTER (WHERE seq = 1) OVER (ORDER BY subset, seq) set_name
, seq
, COALESCE(sub, source.id) id
FROM rec
CROSS JOIN source
LEFT JOIN LATERAL (
SELECT id, sub
FROM unnest(subset) x(id TEXT, sub TEXT)
) x ON source.id = x.id;
Tests
With replacement values ('B1', 'B2'), ('D1', 'D2'), the query returns 4 groups.
subset | set_name | seq | id
-----------------------+----------+-----+----
| 0 | 1 | A1
| 0 | 2 | B1
| 0 | 3 | C1
| 0 | 4 | D1
{"(B1,B2)"} | 1 | 1 | A1
{"(B1,B2)"} | 1 | 2 | B2
{"(B1,B2)"} | 1 | 3 | C1
{"(B1,B2)"} | 1 | 4 | D1
{"(D1,D2)"} | 2 | 1 | A1
{"(D1,D2)"} | 2 | 2 | B1
{"(D1,D2)"} | 2 | 3 | C1
{"(D1,D2)"} | 2 | 4 | D2
{"(D1,D2)","(B1,B2)"} | 3 | 1 | A1
{"(D1,D2)","(B1,B2)"} | 3 | 2 | B2
{"(D1,D2)","(B1,B2)"} | 3 | 3 | C1
{"(D1,D2)","(B1,B2)"} | 3 | 4 | D2
(16 rows)
With replacement values ('B1', 'B2'), ('D1', 'D2'), ('D1', 'D3'), The query returns 6 groups:
subset | set_name | seq | id
-----------------------+----------+-----+----
| 0 | 1 | A1
| 0 | 2 | B1
| 0 | 3 | C1
| 0 | 4 | D1
{"(B1,B2)"} | 1 | 1 | A1
{"(B1,B2)"} | 1 | 2 | B2
{"(B1,B2)"} | 1 | 3 | C1
{"(B1,B2)"} | 1 | 4 | D1
{"(D1,D2)"} | 2 | 1 | A1
{"(D1,D2)"} | 2 | 2 | B1
{"(D1,D2)"} | 2 | 3 | C1
{"(D1,D2)"} | 2 | 4 | D2
{"(D1,D2)","(B1,B2)"} | 3 | 1 | A1
{"(D1,D2)","(B1,B2)"} | 3 | 2 | B2
{"(D1,D2)","(B1,B2)"} | 3 | 3 | C1
{"(D1,D2)","(B1,B2)"} | 3 | 4 | D2
{"(D1,D3)"} | 4 | 1 | A1
{"(D1,D3)"} | 4 | 2 | B1
{"(D1,D3)"} | 4 | 3 | C1
{"(D1,D3)"} | 4 | 4 | D3
{"(D1,D3)","(B1,B2)"} | 5 | 1 | A1
{"(D1,D3)","(B1,B2)"} | 5 | 2 | B2
{"(D1,D3)","(B1,B2)"} | 5 | 3 | C1
{"(D1,D3)","(B1,B2)"} | 5 | 4 | D3
(24 rows)
With replacement values ('B1', 'B2'), ('C1', 'C2'), ('D1', 'D2'), The query returns 8 groups:
subset | set_name | seq | id
---------------------------------+----------+-----+----
| 0 | 1 | A1
| 0 | 2 | B1
| 0 | 3 | C1
| 0 | 4 | D1
{"(B1,B2)"} | 1 | 1 | A1
{"(B1,B2)"} | 1 | 2 | B2
{"(B1,B2)"} | 1 | 3 | C1
{"(B1,B2)"} | 1 | 4 | D1
{"(C1,C2)"} | 2 | 1 | A1
{"(C1,C2)"} | 2 | 2 | B1
{"(C1,C2)"} | 2 | 3 | C2
{"(C1,C2)"} | 2 | 4 | D1
{"(C1,C2)","(B1,B2)"} | 3 | 1 | A1
{"(C1,C2)","(B1,B2)"} | 3 | 2 | B2
{"(C1,C2)","(B1,B2)"} | 3 | 3 | C2
{"(C1,C2)","(B1,B2)"} | 3 | 4 | D1
{"(D1,D2)"} | 4 | 1 | A1
{"(D1,D2)"} | 4 | 2 | B1
{"(D1,D2)"} | 4 | 3 | C1
{"(D1,D2)"} | 4 | 4 | D2
{"(D1,D2)","(B1,B2)"} | 5 | 1 | A1
{"(D1,D2)","(B1,B2)"} | 5 | 2 | B2
{"(D1,D2)","(B1,B2)"} | 5 | 3 | C1
{"(D1,D2)","(B1,B2)"} | 5 | 4 | D2
{"(D1,D2)","(C1,C2)"} | 6 | 1 | A1
{"(D1,D2)","(C1,C2)"} | 6 | 2 | B1
{"(D1,D2)","(C1,C2)"} | 6 | 3 | C2
{"(D1,D2)","(C1,C2)"} | 6 | 4 | D2
{"(D1,D2)","(C1,C2)","(B1,B2)"} | 7 | 1 | A1
{"(D1,D2)","(C1,C2)","(B1,B2)"} | 7 | 2 | B2
{"(D1,D2)","(C1,C2)","(B1,B2)"} | 7 | 3 | C2
{"(D1,D2)","(C1,C2)","(B1,B2)"} | 7 | 4 | D2
(32 rows)

I can only think of a brute force approach. Enumerate the groups and multiply the second table -- so one set of rows for each group.
The following then uses bit manipulation to choose which value:
WITH a as (
SELECT * FROM (values (1,'A1'),(2,'B1'), (3,'C1'), (4,'D1') ) as a1(seq, id)
),
b as (
SELECT * FROM (values ('B1','B2'), ('D1','D2')) as b1(id,alter)
),
bgroups as (
SELECT b.*, grp - 1 as grp, ROW_NUMBER() OVER (PARTITION BY grp ORDER BY id) - 1 as seqnum
FROM b CROSS JOIN
GENERATE_SERIES(1, (SELECT POWER(2, COUNT(*))::int FROM b)) gs(grp)
)
SELECT bg.grp, a.seq,
COALESCE(MAX(CASE WHEN a.id = bg.id AND (POWER(2, bg.seqnum)::int & bg.grp) > 0 THEN bg.alter END),
MAX(a.id)
) as id
FROM a CROSS JOIN
bgroups bg
GROUP BY bg.grp, a.seq
ORDER BY bg.grp, a.seq;
Here is a db<>fiddle.

Add additional column to give a count on number of occurance on an existing column

I have an existing table with many fields of personal data.
For each personal records, there is a unique reference number
I am trying to do a script that can add a new column onto the existing table
This new added column is to hold number count on the number of time the unique reference on each row have came up.
For example:
---------------------------
UniqueID | PersonlData1 | PersonalData2 |
A | A1 | A2 |
B | B1 | B2 |
C | C1 | C2 |
D | D1 | D2 |
A | AA1 | AA2 |
D | DD1 | DD2 |
To become:
---------------------------
UniqueID | PersonlData1 | PersonalData2 | CountID |
A | A1 | A2 | 2 |
B | B1 | B2 | 1 |
C | C1 | C2 | 1 |
D | D1 | D2 | 2 |
A | AA1 | AA2 | 2 |
D | DD1 | DD2 | 2 |

We can try using COUNT as an analytic function here:
SELECT
UniqueID,
Person1Data1,
PersonalData2,
COUNT(*) OVER (PARTITION BY UniqueID) CountID
FROM yourTable;

How to return values from same table?

I've two tables A and B. I want to return all records from A and only matching from B. I can use left join for this. But after joining, I want to return records based on a flag in the same table.
Table A:
| Col1 | Col2 |
|------|------|
| 123 | 12 |
| 456 | 34 |
| 789 | 56 |
Table B:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|------|------|
| 123 | 12 | NULL | I | 1 |
| 456 | 34 | NULL | E | 1 |
| 111 | 98 | NULL | I | 1 |
| 222 | 99 | NULL | E | 1 |
| 123 | 12 | AB | NULL | 2 |
| 456 | 34 | CD | NULL | 2 |
| 123 | 12 | EF | NULL | 2 |
| 111 | 98 | GH | NULL | 2 |
| 222 | 99 | IJ | NULL | 2 |
After left joining A and B this how the result will look like:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|------|------|
| 123 | 12 | NULL | I | 1 |
| 456 | 34 | NULL | E | 1 |
| 123 | 12 | AB | NULL | 2 |
| 456 | 34 | CD | NULL | 2 |
| 123 | 12 | EF | NULL | 2 |
| 789 | 56 | NULL | NULL | NULL |
1 and 2 values in Col5 tells if Col4 should be populated or Col3. 1 for Col4 and 2 for Col3.
I want to return all the records for 'I'(but excluding the record which has 'I') in Col4 which will look like this:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|--------|------|
| 123 | 12 | AB | (null) | 2 |
| 123 | 12 | EF | (null) | 2 |
I also want to return records for 'E' (again excluding the record which has 'E') in col4 but for all the values other than one in Col3. In this case CD. Which would look like this:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|--------|------|
| 456 | 34 | AB | (null) | 2 |
| 456 | 34 | EF | (null) | 2 |
| 456 | 34 | GH | (null) | 2 |
| 456 | 34 | IJ | (null) | 2 |
Can someone suggest how to handle this in SQL?

Ok I believe the following two queries achieve your desired results. You can see all the sample code via the following SQL Fiddle.
Existence Rule:
select A.*
, B.Col3
, B.Col4
, B.Col5
from TableA A
JOIN TableB B
on A.Col1 = B.Col1
and A.Col2 = B.Col2
and B.Col5 = 2
where exists (select 1 from TableB C
where C.col1 = B.col1 and C.col2 = B.col2
and c.col4 = 'I' AND C.col5 = 1)
Results:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|--------|------|
| 123 | 12 | AB | (null) | 2 |
| 123 | 12 | EF | (null) | 2 |
Exclusion Rule:
select A.*
, B.Col3
, B.Col4
, B.Col5
from TableA A
CROSS JOIN TableB B
where b.col5 = 2
and exists (select 1 from TableB C
where C.col1 = a.col1 and C.col2 = a.col2
and c.col4 = 'E' AND C.col5 = 1)
and b.col3 not in (select col3 from TableB b
where b.col1 = a.col1 and b.col2 = a.col2 and b.col5 = 2)
Results:
| Col1 | Col2 | Col3 | Col4 | Col5 |
|------|------|------|--------|------|
| 456 | 34 | AB | (null) | 2 |
| 456 | 34 | EF | (null) | 2 |
| 456 | 34 | GH | (null) | 2 |
| 456 | 34 | IJ | (null) | 2 |

Result for I:-
;with cte1 As(select a.col1,a.col2 from A a left join B b on a.col1 =b.col2 and a.col2=b.col2 where b.col4 = 'I'),cte2 As(select b.col3,b.col4,b.col5 from from A a left join B b on a.col1 =b.col2 and a.col2=b.col2 where b.col4 <> 'I')
Result for E:-
select a.col1,a.col2,b.col3,b.col4,b.col5 from cte1 a cross join cte2 b
;with cte1 As(select a.col1,a.col2 from A a left join B b on a.col1 =b.col2 and a.col2=b.col2 where b.col4 = 'E'),cte2 As(select b.col3,b.col4,b.col5 from from A a left join B b on a.col1 =b.col2 and a.col2=b.col2 where b.col4 <> 'E')
select a.col1,a.col2,b.col3,b.col4,b.col5 from cte1 a cross join cte2 b

select c.col1, c.col2
from
(select a.col1, a.col2, b.col3 from a inner join table b on a.id = b.id
where "condition" ) c
where c.col1 = "condition"
This is the script. The explanation is:
Inside the () i wrote the first select. There, you will do the select with your joins and your conditions. At the end of the select i wrote "c" which is the name of the table generated from the sub-select.
Then, you'll select some values from the generated table and filter them with a where that will act on the results generated by the table created with the sub-select
EDIT: I used your question's names to make it easier

SQL conditional join to multiple tables in one shot

I would like to make conditional join of three tables. Left join table B and if the key is missing then join to table C.
+------+--+------+-------------+--+------+-------------+
| A.id | | B.id | B.OtherFish | | C.id | C.OtherFish |
+------+--+------+-------------+--+------+-------------+
| 1 | | 1 | B1 | | 1 | C1 |
| 2 | | | | | 2 | C2 |
| 3 | | 3 | B3 | | 3 | C3 |
| 4 | | | | | 4 | C4 |
| 5 | | 5 | B5 | | | |
| 6 | | 6 | B6 | | 6 | C6 |
+------+--+------+-------------+--+------+-------------+
There are no matching keys for 2 and 4 in B, and no keys for 5 in C.
Expected results:
+------+-----------+
| A.id | OtherFish |
+------+-----------+
| 1 | B1 |
| 2 | C2 |
| 3 | B3 |
| 4 | C4 |
| 5 | B5 |
| 6 | B6 |
+------+-----------+
The query I use is:
select
A.id
,coalesce(B.id,C.id)
,coalesce(B.OtherFish,C.OtherFish)
from A
left join B
on A.id=B.id
left join C
on A.id=C.id
The drawback of the approach is that I have to use coalesce through all the columns I need from different tables. It is annoying if there are hundred columns in the table. It would be desirable to make a coalesce on the whole aliases like coalesce(B,C).
Is it possible to make it in one shot like:
left join B
on A.id=B.id
left join C
on A.id=(case when B.id is null then C.id end)
so that in C.id I would have all good data without making coalesce through all the columns?

Although I don't see anything wrong with COALESCE , you can try using UNION , something like this:
SELECT a.id,t.OtherFish
FROM A
LEFT JOIN(SELECT b.id,b.otherFish FROM b
UNION ALL
SELECT c.id,c.otherFish FROM C
where c.id NOT EXISTS(SELECT 1 FROM b bb WHERE bb.id = c.id)) t
ON(a.id = t.id)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select from two slightly similar tables with complex conditions - sql

Related

Count multiple Columns in Oracle SQL

SQL: Update with all possible combinations

Add additional column to give a count on number of occurance on an existing column

How to return values from same table?

SQL conditional join to multiple tables in one shot

Categories

Resources