How to tune the multiple self join query in SQL Server

How to tune the multiple self join query in SQL Server - sql

I have table like as below
I need to fetch summary based on issue id and parent id, I need to fetch actual parent summary as parent desc and preceding summary as succeed desc. There will be 5 level hierarchy (max).
So I self-joined the table for 5 times, the below code is the example code. But the query is taking almost 250 seconds to execute. How to optimize the code in SQL Server?
Expected output
WITH cte_a AS
(
SELECT
'AON property' AS summary, 1001 AS issue_id, 2001 AS parent_id
UNION ALL
SELECT 'AON property L1', 2001, 3001
UNION ALL
SELECT 'AON Property L2', 3001, 4001
UNION ALL
SELECT 'AON Property L3', 4001, NULL
UNION ALL
SELECT 'LONG CHAIN CLUBS', 1002, 2222
UNION ALL
SELECT 'LONG CHAIN L1', 2222, 3003
UNION ALL
SELECT 'LONG CHAIN L2', 3003, NULL
)
SELECT
a.*,
CASE
WHEN f.summary IS NOT NULL THEN e.summary
WHEN e.summary IS NOT NULL THEN d.summary
WHEN d.summary IS NOT NULL THEN c.summary
WHEN c.summary IS NOT NULL THEN b.summary
WHEN b.summary IS NOT NULL THEN then a.summary
END AS succeed_desc,
COALESCE (f.summary, e.summary, d.summary, c.summary, b.summary, a.summary) AS parent_desc
FROM
cte_a a
LEFT JOIN
cte_a b ON a.parent_id = b.issue_id --Level1
LEFT JOIN
cte_a c ON b.parent_id = c.issue_id --Level2
LEFT JOIN
cte_a d ON c.parent_id = d.issue_id --Level3
LEFT JOIN
cte_a e ON d.parent_id = e.issue_id --Level4
LEFT JOIN
cte_a f ON e.parent_id = f.issue_id --Level5

I agree with Martin Smith saying in the comments that you need to define an INDEX.
With that said, you also need to look into recursive CTEs. summary, issue_id, parent_id all come from cte_a; parent_desc is going to be the result of the recursive CTE and succeed_desc is the result of a simple join.
I will work my way through in 3 steps.
Step 1: Building up the recursive CTE
A recursive CTE is in the form:
WITH Recursive_CTE AS (
SELECT ...
FROM SomeTable
UNION ALL
SELECT ...
FROM Recursive_CTE
JOIN SomeTable ON ...
)
In your case, you want to browse into records from a parent_id to the next issue_id. Importantly, you want to remember the original issue_id and parent_id to build the final result AND the joined parent_id to keep your recursion going.
For clarity's sake, I will add a column so you can keep track of the recursion (I will drop it later).
WITH cte_a(summary, issue_id, parent_id) AS
(
...
), cte_b(summary, issue_id, parent_id, ancestor_id, loop_number) AS (
SELECT summary, issue_id, parent_id, parent_id, 0 from cte_a
UNION ALL
SELECT cte_b.summary, cte_b.issue_id, cte_b.parent_id, cte_a.parent_id, 1+loop_number
FROM cte_a
JOIN cte_b ON cte_a.issue_id = cte_b.ancestor_id
)
SELECT *
FROM cte_b
Note that parent_id appears twice in the initialisation of the recursive CTE. You should not be surprised by that: 1 is to keep the original value, the other is for the loop. The difference shows after the UNION ALL (cte_b.parent_id vs cte_a.parent_id).
Step 2: Preparing the output for the final query
We now need to tweak the above query in order to:
Add the parent_desc column (that will be in the recursive CTE)
Allow the additiona of the succeed_desc column (that will also be in the CTE)
Keep only the records we want to keep in the final result (that will be outside the CTE)
For that, we add 2 columns in the CTE and the records we want to keep are the ones where the recursion stopped (ancestor_id IS NULL):
WITH cte_a(summary, issue_id, parent_id) AS
(
...
), cte_b(summary, issue_id, parent_id, penultimate_ancestor_id, ancestor_id, ancestor_summary) AS (
SELECT summary, issue_id, parent_id, issue_id, parent_id, summary from cte_a
UNION ALL
SELECT cte_b.summary, cte_b.issue_id, cte_b.parent_id,
cte_a.issue_id, cte_a.parent_id, cte_a.summary
FROM cte_a
JOIN cte_b ON cte_a.issue_id = cte_b.ancestor_id
)
SELECT *
FROM cte_b
WHERE ancestor_id IS NULL
Step 3: Finalizing the query
In the previous step, we added the penultimate_ancestor_id column to keep track of the last issue_id encountered before reaching NULL. succeed_desc can easily be obtained using it.
WITH cte_a(summary, issue_id, parent_id) AS
(
...
), cte_b(summary, issue_id, parent_id, penultimate_ancestor_id, ancestor_id, ancestor_summary) AS (
SELECT summary, issue_id, parent_id, issue_id, parent_id, summary from cte_a
UNION ALL
SELECT cte_b.summary, cte_b.issue_id, cte_b.parent_id,
cte_a.issue_id, cte_a.parent_id, cte_a.summary
FROM cte_a
JOIN cte_b ON cte_a.issue_id = cte_b.ancestor_id
)
SELECT subquery.summary,
subquery.issue_id,
subquery.parent_id,
cte_a.summary as succeed_desc,
subquery.ancestor_summary as parent_desc
FROM (SELECT * FROM cte_b WHERE ancestor_id IS NULL) subquery
LEFT OUTER JOIN cte_a ON subquery.parent_id IS NOT NULL
AND subquery.penultimate_ancestor_id = cte_a.parent_id
ORDER BY parent_desc, issue_id

Related

Fetch rows with same id and different prod_id

I have two tables: tbltest1 and tbltest2
I want all the distinct rows of both tables, except the ones that have null in prod_id unless there is not any row in both tables with the same id with a not null prod_id
I tried to make a set with all the values then DISTINCTed to take only the unique ones and after used ROWNUMBER() OVER().:
with p as(
select t.*
from tbltest1 as t
union all
select d.*
from tbltest2 as d
),
s as (
select distinct colb, num,
ROW_NUMBER() OVER (PARTITION BY num ORDER BY colb DESC) as rnk
from p
)select *
from s
where rnk = 1
How can I achieve that? Is there also any other more efficient way to do it instead of this logic?

Use UNION for the 2 tables to remove the duplicates (if any) and then NOT EXISTS:
WITH cte AS (
SELECT prod_id, dn FROM tbltest2
UNION
SELECT prod_id1, dn1 FROM tbltest1
)
SELECT c1.*
FROM cte c1
WHERE c1.prod_id IS NOT NULL
OR NOT EXISTS (SELECT 1 FROM cte c2 WHERE c2.dn = c1.dn AND c2.prod_id IS NOT NULL)
See the demo.

Getting MAX datetime event from multiple tables, and outputing a simple list of most recent events by ID

I have a table:
and multiple other tables - consider them purchases, in this example:
And would like an output table to show the most recent purchase (NB that there may be multiple instances of a purchase within each table), by id from the main table:
The id can be a customer number, for example.
I've tried using OUTER APPLY on each purchase table, getting the TOP 1 by datetime desc, then getting the max value from the OUTER APPLY tables, but I would not get the table name - eg. Apples, just the datetime.
Another idea was to UNION all of the purchase tables together in a join with the main table (by id), and pick out the top 1 datetime and a table name, but I don't think this would be very efficient for a lot of rows:
SELECT MT.id, MT.gender, MT.age,
b.Name as LastPurchase, b.dt as LastPurchaseDateTime
FROM MainTable MT
LEFT JOIN (
SELECT id, Name, MAX(dt) FROM
(
SELECT id, 'Apples' as Name, ApplesDateTime as dt FROM ApplesTable
UNION
SELECT id, 'Pears' as Name, PearsDateTime as dt FROM PearsTable
UNION
SELECT id, 'Bananas' as Name, BananasDateTime as dt FROM BananasTable
)a
GROUP BY etc
)b
Does anyone have a more sensible idea?
Many thanks in advance.

I would go for a lateral join:
select m.*, x.*
from maintable m
outer apply (
select top (1) x.*
from (
select id, 'apples' as name, applesdatetime as dt from applestable
union all select id, 'pears', pearsdatetime from pearstable
union all select id, 'bananas', bananasdatetime from bananastable
) x
where x.id = m.id
order by dt desc
) x

I would suggest apply:
SELECT MT.id, mt.gender, mt.age, p.*
FROM MainTable MT OUTER APPLY
(SELECT p.name, p.dt
FROM (SELECT id, 'Apples' as Name, ApplesDateTime as dt FROM ApplesTable
UNION ALL
SELECT id, 'Pears' as Name, PearsDateTime as dt FROM PearsTable
UNION ALL
SELECT id, 'Bananas' as Name, BananasDateTime as dt FROM BananasTable
) p
WHERE p.id = mt.id
ORDER BY dt DESC
) p

How to join two tables with the same number of rows in SQLite?

I have almost the same problem as described in this question. I have two tables with the same number of rows, and I would like to join them together one by one.
The tables are ordered, and I would like to keep this order after the join, if it is possible.
There is a rowid based solution for MSSql, but in SQLite rowid can not be used if the table is coming from a WITH statement (or RECURSIVE WITH).
It is guaranteed that the two tables have the exact same number of rows, but this number is not known beforehand. It is also important to note, that the same element may occur more than twice. The results are ordered, but none of the columns are unique.
Example code:
WITH
table_a (n) AS (
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 5
),
table_b (s) AS (
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
)
SELECT table_a.n, table_b.s
FROM table_a
LEFT JOIN table_b ON ( table_a.rowid = table_b.rowid )
The result I would like to achieve is:
(2, 'valuex'),
(4, 'valuey'),
(5, 'valuez')
SQLFiddle: http://sqlfiddle.com/#!5/9eecb7/6888

This is quite complicated in SQLite -- because you are allowing duplicates. But you can do it. Here is the idea:
Summarize the table by the values.
For each value, get the count and offset from the beginning of the values.
Then use a join to associate the values and figure out the overlap.
Finally use a recursive CTE to extract the values that you want.
The following code assumes that n and s are ordered -- as you specify in your question. However, it would work (with small modifications) if another column specified the ordering.
You will notice that I have included duplicates in the sample data:
WITH table_a (n) AS (
SELECT 2 UNION ALL
SELECT 4 UNION ALL
SELECT 4 UNION ALL
SELECT 4 UNION ALL
SELECT 5
),
table_b (s) AS (
SELECT 'valuex' UNION ALL
SELECT 'valuey' UNION ALL
SELECT 'valuey' UNION ALL
SELECT 'valuez' UNION ALL
SELECT 'valuez'
),
a as (
select a.n, count(*) as a_cnt,
(select count(*) from table_a a2 where a2.n < a.n) as a_offset
from table_a a
group by a.n
),
b as (
select b.s, count(*) as b_cnt,
(select count(*) from table_b b2 where b2.s < b.s) as b_offset
from table_b b
group by b.s
),
ab as (
select a.*, b.*,
max(a.a_offset, b.b_offset) as offset,
min(a.a_offset + a.a_cnt, b.b_offset + b.b_cnt) - max(a.a_offset, b.b_offset) as cnt
from a join
b
on a.a_offset + a.a_cnt - 1 >= b.b_offset and
a.a_offset <= b.b_offset + b.b_cnt - 1
),
cte as (
select n, s, offset, cnt, 1 as ind
from ab
union all
select n, s, offset, cnt, ind + 1
from cte
where ind < cnt
)
select n, s
from cte
order by n, s;
Here is a DB Fiddle showing the results.
I should note that this would be much simpler in almost any other database, using window functions (or perhaps variables in MySQL).

Since the tables are ordered, you can add row_id values by comparing n values.
But still the best way in order to get better performance would be inserting the ID values while creating the tables.
http://sqlfiddle.com/#!5/9eecb7/7014
WITH
table_a_a (n, id) AS
(
WITH table_a (n) AS
(
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 5
)
SELECT table_a.n, (select count(1) from table_a b where b.n <= table_a.n) id
FROM table_a
) ,
table_b_b (n, id) AS
(
WITH table_a (n) AS
(
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
)
SELECT table_a.n, (select count(1) from table_a b where b.n <= table_a.n) id
FROM table_a
)
select table_a_a.n,table_b_b.n from table_a_a,table_b_b where table_a_a.ID = table_b_b.ID
or convert the input set to comma separated list and try like this:
http://sqlfiddle.com/#!5/9eecb7/7337
WITH RECURSIVE table_b( id,element, remainder ) AS (
SELECT 0,NULL AS element, 'valuex,valuey,valuz,valuz' AS remainder
UNION ALL
SELECT id+1,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
ELSE
remainder
END AS element,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, INSTR( remainder, ',' )+1 )
ELSE
NULL
END AS remainder
FROM table_b
WHERE remainder IS NOT NULL
),
table_a( id,element, remainder ) AS (
SELECT 0,NULL AS element, '2,4,5,7' AS remainder
UNION ALL
SELECT id+1,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
ELSE
remainder
END AS element,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, INSTR( remainder, ',' )+1 )
ELSE
NULL
END AS remainder
FROM table_a
WHERE remainder IS NOT NULL
)
SELECT table_b.element, table_a.element FROM table_b, table_a WHERE table_a.element IS NOT NULL and table_a.id = table_b.id;

SQL
SELECT a1.n, b1.s
FROM table_a a1
LEFT JOIN table_b b1
ON (SELECT COUNT(*) FROM table_a a2 WHERE a2.n <= a1.n) =
(SELECT COUNT(*) FROM table_b b2 WHERE b2.s <= b1.s)
Explanation
The query simply counts the number of rows up until the current one for each table (based on the ordering column) and joins on this value.
Demo
See SQL Fiddle demo.
Assumptions
A single column in used for the ordering in each table. (But the query could easily be modified to allow multiple ordering columns).
The ordering values in each table are unique.
The values in the ordering column aren't necessarily the same between the two tables.
It is known that table_a contains either the same or more rows than table_b. (If this isn't the case then a FULL OUTER JOIN would need to be emulated since SQLite doesn't provide one.)
No further changes to the table structure are allowed. (If they are, it would be more efficient to have pre-populated columns for the ordering).

Either way...
Use something like
WITH
v_table_a (n, rowid) AS (
SELECT 2, 1
UNION ALL
SELECT 4, 2
UNION ALL
SELECT 5, 3
),
v_table_b (s, rowid) AS (
SELECT 'valuex', 1
UNION ALL
SELECT 'valuey', 2
UNION ALL
SELECT 'valuez', 3
)
SELECT v_table_a.n, v_table_b.s
FROM v_table_a
LEFT JOIN v_table_b ON ( v_table_a.rowid = v_table_b.rowid );
for "virtual" tables (with WITH or without),
WITH RECURSIVE vr_table_a (n, rowid) AS (
VALUES (2, 1)
UNION ALL
SELECT n + 2, rowid + 1 FROM vr_table_a WHERE rowid < 3
)
, vr_table_b (s, rowid) AS (
VALUES ('I', 1)
UNION ALL
SELECT s || 'I', rowid + 1 FROM vr_table_b WHERE rowid < 3
)
SELECT vr_table_a.n, vr_table_b.s
FROM vr_table_a
LEFT JOIN vr_table_b ON ( vr_table_a.rowid = vr_table_b.rowid );
for "virtual" tables using recursive WITHs (in this example the values are others then yours, but I guess you get the point) and
CREATE TABLE p_table_a (n INT);
INSERT INTO p_table_a VALUES (2), (4), (5);
CREATE TABLE p_table_b (s VARCHAR(6));
INSERT INTO p_table_b VALUES ('valuex'), ('valuey'), ('valuez');
SELECT p_table_a.n, p_table_b.s
FROM p_table_a
LEFT JOIN p_table_b ON ( p_table_a.rowid = p_table_b.rowid );
for physical tables.
I'd be careful with the last one though. A quick test shows, that the numbers of rowid are a) reused -- when some rows are deleted and others are inserted, the inserted rows get the rowids from the old rows (i.e. rowid in SQLite isn't unique past the lifetime of a row, whereas e.g. Oracle's rowid AFAIR is) -- and b) corresponds to the order of insertion. But I don't know and didn't find a clue in the documentation, if that's guaranteed or is subject to change in other/future implementations. Or maybe it's just a mere coincidence in my test environment.
(In general physical order of rows may be subject to change (even within the same database using the same DMBS as a result of some reorganization) and is therefore no good choice to rely on. And it's not guaranteed, a query will return the result ordered by physical position in the table as well (it might use the order of some index instead or have a partial result ordered some other way influencing the output's order). Consider designing your tables using common (sort) keys in corresponding rows for ordering and to join on.)

You can create temp tables to carry CTE data row. then JOIN them by sqlite row_id column.
CREATE TEMP TABLE temp_a(n integer);
CREATE TEMP TABLE temp_b(n VARCHAR(255));
WITH table_a(n) AS (
SELECT 2 n
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 5
)
INSERT INTO temp_a (n) SELECT n FROM table_a;
WITH table_b (n) AS
(
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
UNION ALL
SELECT 'valuew'
)
INSERT INTO temp_b (n) SELECT n FROM table_b;
SELECT *
FROM temp_a a
INNER JOIN temp_b b on a.rowid = b.rowid;
sqlfiddle:http://sqlfiddle.com/#!5/9eecb7/7252

It is possible to use the rowid inside a with statement but you need to select it and make it available to the query using it.
Something like this:
with tablea AS (
select id, rowid AS rid from someids),
tableb AS (
select details, rowid AS rid from somedetails)
select tablea.id, tableb.details
from
tablea
left join tableb on tablea.rid = tableb.rid;
It is however as they have already warned you a really bad idea. What if the app breaks after inserting in one table but before the other one? What if you delete an old row? If you want to join two tables you need to specify the field to do so. There are so many things that could go wrong with this design. The most similar thing to this would be an incremental id field that you would save in the table and use in your application. Even simpler, make those into one table.
Read this link for more information about the rowid: https://www.sqlite.org/lang_createtable.html#rowid
sqlfiddle: http://sqlfiddle.com/#!7/29fd8/1

It is possible to use the rowid inside a with statement but you need to select it and make it available to the query using it. Something like this:
with tablea AS (select id, rowid AS rid from someids),
tableb AS (select details, rowid AS rid from somedetails)
select tablea.id, tableb.details
from
tablea
left join tableb on tablea.rid = tableb.rid;

The problem statement indicates:
The tables are ordered
If this means that the ordering is defined by the ordering of the values in the UNION ALL statements, and if SQLite respects that ordering, then the following solution may be of interest because, apart from small tweaks to the last three lines of the sample program, it adds just two lines:
A(rid,n) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, n FROM table_a),
B(rid,s) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, s FROM table_b)
That is, table A is table_a augmented with a rowid, and similarly for table B.
Unfortunately, there is a caveat, though it might just be the result of my not having found the relevant specifications. Before delving into that, however, here is the full proposed solution:
WITH
table_a (n) AS (
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 5
),
table_b (s) AS (
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
),
A(rid,n) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, n FROM table_a),
B(rid,s) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, s FROM table_b)
SELECT A.n, B.s
FROM A LEFT JOIN B
ON ( A.rid = B.rid );
Caveat
The proposed solution has been tested against a variety of data sets using sqlite version 3.29.0, but whether or not it is, and will continue to be, "guaranteed" to work is unclear to me.
Of course, if SQLite offers no guarantees with respect to the ordering of the UNION ALL statements (that is, if the question is based on an incorrect assumption), then it would be interesting to see a well-founded reformulation.

Diff between two tables (using sql) -> incremental changes

I have a need to identify differences between two tables. I have looked at sql query to return differences between two tables but it was a bit too different for me to extrapolate with my current SQL skills.
Table A is a snapshot of a certain group of people where the snapshot was taken yesterday, where each row is a unique person and certain characteristics about the person. Table B is the same snapshot taken 24 hours later. Within the 24 hour period:
New people may have been added.
People from yesterday may have been removed.
People from yesterday may have changed (i.e., original row is there, but one or more column values have changed).
My output should have the following:
a row for each new person added
a row for each person removed
a row for each person who has changed
I would grateful for any ideas. Thanks!

This type of problem has a very simple and efficient solution that does not use joins (it doesn't even use a union of the results of two MINUS operations) - it just uses one union and a GROUP BY operation. The solution was developed in a thread on AskTom many years ago, it is surprising that it is not more widely known and used. For example (but not only): https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:24371552251735
In your case, assuming there is a primary key constraint on PERSON_ID (which makes the solution simpler):
select max(flag) as flag, PERSON_ID, first_name, last_name, (etc. - all the columns)
from ( select 'old' as flag, t1.*
from old_table t1
union all
select 'new' as flag, t2.*
from new_table t2
)
group by PERSON_ID, first_name, last_name, (etc.)
having count(*) = 1
order by PERSON_ID -- optional
;
If for a PERSON_ID all the data is the same in both tables, that will result in a count of 2 for that group. So it won't pass the HAVING condition. The only groups that will have a count of 1 (and therefore will be just one row each!) are either rows that are in one table but not the other. If a person was added, that will show only one row, with the flag = 'new'. If a person was deleted, you will get only one row, with the flag 'old'. If there were updates, the same PERSON_ID will appear twice, but since at least one field is different, the two rows (one with flag 'new' and the other with 'old') will be in different groups, they will pass the HAVING filter, and they will BOTH be in the output.
Which is slightly different from what you requested; you will get both the old AND the new information for updates, labeled as 'old' and 'new'. You said you wanted only one of those but didn't state which one. This will give you both (which makes more sense anyway), but if you really only want one, it can be done easily in the query above.
Note - the outer select must have max(flag) rather than flag because flag is not a GROUP BY column; but it's the max() over exactly one row, so it WILL be the flag for that row anyway.
Added - OP indicated he would like to get only the "new" row for a person with updated (changed, modified) data. The approach shown below will change the flag to "changed" in this case.
with old_table ( person_id, first_name, last_name ) as (
select 101, 'John', 'Smith' from dual union all
select 102, 'Mary', 'Green' from dual union all
select 103, 'July', 'Dobbs' from dual union all
select 104, 'Will', 'Scott' from dual
),
new_table ( person_id, first_name, last_name ) as (
select 101, 'Joe' , 'Smith' from dual union all
select 102, 'Mary', 'Green' from dual union all
select 104, 'Will', 'Scott' from dual union all
select 105, 'Andy', 'Brown' from dual
)
-- end of test data; solution (SQL query) begins below this line
select case ct when 1 then flag else 'changed' end as flag,
person_id, first_name, last_name
from (
select max(flag) as flag, person_id, first_name, last_name,
count(*) over (partition by person_id) as ct,
row_number() over (partition by person_id order by max(flag)) as rn
from ( select 'old' as flag, t1.*
from old_table t1
union all
select 'new' as flag, t2.*
from new_table t2
)
group by person_id, first_name, last_name
having count(*) = 1
)
where rn = 1
order by person_id -- ORDER BY clause is optional
;
Output:
FLAG PERSON_ID FIRS_NAME LAST_NAME
------- ---------- --------- ---------
changed 101 Joe Smith
old 103 July Dobbs
new 105 Andy Brown

The first 2 parts are easy:
select 'New', name from B where not exists (select name from A where A.name=B.name)
union select 'Removed', name from A where not exists (select name from B where B.name = A.name)
The last one is where you need to compare characteristics. How many of them are there? Do you want to list what has changed or only that they have changed?
For argument's sake, let us only say that the characteristics are address and telephone #:
union select 'Phone', name from A,B where A.name = B.name and A.telephone != B.telephone
union select 'Address', name from A,B where A.name = B.name and A.address != B.address

Note: The question isn't currently tagged with the dbms. I use sql-server, so that's what I used to write the below. There may be slight differences in another dbms.
You can do something along these lines:
select *
from TableA a
left join TableB b on b.ID = a.ID
where a.ID is null -- added since yesterday
union
select *
from TableA a
left join TableB b on b.ID = a.ID
where b.ID is null -- removed since yesterday
union
select *
from TableA a
inner join TableB b on b.ID = a.ID -- restrict to records in both tables
where a.SomeValue <> b.SomeValue
or a.SomeOtherValue <> b.SomeOtherValue
--etc
Each select handles one portion of your expected output. In this manner, they'd all be joined into 1 result set. If you drop the union, you'll end up with a separate set for each.

I suggest to use Except to get the changed records. The below query should work if the db is sql server.
-- added since yesterday
SELECT B.*
FROM TableA A
LEFT Outer Join TableB B on B.ID = A.ID
WHERE A.ID IS NULL
UNION
-- removed since yesterday
SELECT A.*
FROM TableA A
LEFT OUTER JOIN TableB B on B.ID = A.ID
WHERE B.ID IS NULL
UNION
-- Those changed with values from yesterdady
SELECT B.* FROM TableB B WHERE EXISTS(SELECT A.ID FROM TableA A WHERE A.ID = B.ID)
EXCEPT
SELECT A.* FROM TableA A WHERE EXISTS(SELECT B.ID FROM TableB B WHERE B.ID = A.ID)

Assuming you have a unique id for each person in the able, you can use full outer join:
select coalesce(ty.customerid, tt.customerid) as customerid,
(case when ty.customerid is null then 'New'
when tt.customerid is null then 'Removed'
else 'Modified'
end) as status
from tyesterday ty full outer join
ttoday tt
on ty.customerid= tt.customerid
where ty.customerid is null or
tt.customerid is null or
(tt.col1 <> ty.col1 or tt.col2 <> ty.col2 or . . . ); -- may need to take `NULL`s into account

mathguy provided a successful answer to my initial problem. I asked him for a revision (to make it even better). He provided a revision, but I am getting a "missing keyword" error when executing against my code. Here is my code:
select case when ct = 1 then flag else 'changed' as flag, PERSON_ID, FIRSTNAME, LASTNAME
from (
select max(flag), PERSON_ID, FIRSTNAME, LASTNAME
count() over (partition by PERSON_ID) as ct,
row_number() over (partition by PERSON_ID
order by case when flag = 'new' then 0 end) as rn
from ( select 'old' as flag, t1.*
from YESTERDAY_TABLE t1
union all
select 'new' as flag, t2.*
from TODAY_TABLE t2
)
group by PERSON_ID, FIRSTNAME, LASTNAME
having count(*) = 1
)
where rn = 1
order by PERSON_ID;

ORACLE join two table with comma separated ids

I have two tables
Table 1
ID NAME
1 Person1
2 Person2
3 Person3
Table 2
ID GROUP_ID
1 1
2 2,3
The IDs in all the columns above refer to the same ID (Example - a Department)
My Expected output (by joining both the tables)
GROUP_ID NAME
1 Person1
2,3 Person2,Person3
Is there a query with which I can achieve this.

It can be done. You shouldn't do it, but perhaps you don't have the power to change the world. (If you have a say in it, you should normalize your table design - in your case, both the input and the output fail the first normal form).
Answering more as good practice for myself... This solution guarantees that the names will be listed in the same order as the id's. It is not the most efficient, and it doesn't deal with id's in the list that are not found in the first table (it simply discards them instead of leaving a marker of some sort).
with
table_1 ( id, name ) as (
select 1, 'Person1' from dual union all
select 2, 'Person2' from dual union all
select 3, 'Person3' from dual
),
table_2 ( id, group_id ) as (
select 1, '1' from dual union all
select 2, '2,3' from dual
),
prep ( id, lvl, token ) as (
select id, level, regexp_substr(group_id, '[^,]', 1, level)
from table_2
connect by level <= regexp_count(group_id, ',') + 1
and prior id = id
and prior sys_guid() is not null
)
select p.id, listagg(t1.name, ',') within group (order by p.lvl) as group_names
from table_1 t1 inner join prep p on t1.id = p.token
group by p.id;
ID GROUP_NAMES
---- --------------------
1 Person1
2 Person2,Person3

select t2.group_id, listagg(t1.name,',') WITHIN GROUP (ORDER BY 1)
from table2 t2, table1 t1
where ','||t2.group_id||',' like '%,'||t1.id||',%'
group by t2.id, t2.group_id
Normalize you data model, this perversion !!! Сomma separated list should not exist in database. Only individual rows per data unit.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to tune the multiple self join query in SQL Server - sql

Related

Fetch rows with same id and different prod_id

Getting MAX datetime event from multiple tables, and outputing a simple list of most recent events by ID

How to join two tables with the same number of rows in SQLite?

Diff between two tables (using sql) -> incremental changes

ORACLE join two table with comma separated ids

Categories

Resources