SQL update table refers to itself - sql

I have a Table ProfileText which contains four columns: Id, Text, State, PreviousText.
State is an enum with the following states: released, draft.
PreviousText is a reference to another ProfileText.
Whenever a new ProfileText is created, the previous ProfileText is set as PreviousText. The State does not matter. This creates a kind of "timeline" of released and drafted texts. Now I want to modify PreviousText. The "timeline" should only contain entries with State = released.
Example old: A(released) -> B(draft) -> C(released) -> null
Example new: A(released) -> C(released) -> null
How can I make this update table in SQL?

If I understand your goal correctly, you should be able to accomplish this with an update that makes use of a recursive CTE (common table expression):
with recursive timeline as (
select * from ProfileText where State = 'released'
union
select a.Id, a.Text, a.State, b.PreviousText
from timeline as a
join ProfileText as b on b.Id = a.PreviousText
where b.State <> 'released'
)
update ProfileText as a set PreviousText = b.PreviousText
from timeline as b
left join ProfileText as c on c.Id = b.PreviousText
where a.Id = b.Id and
(b.PreviousText is null or c.State = 'released'
and a.PreviousText <> b.PreviousText);
What this does is it selects all "released" records in the non-recursive part of the CTE regardless of the state of previous records they may be referencing, then for any that reference previous records which are not released, it yields additional records containing the original values of the released entry with the reference to the previous entry's previous entry, continuing until it reaches a released record or a null reference.
Then all it has to do is update all entries in the table that are in the newly constructed timeline and either have a null entry in the timeline (necessary for when the oldest released entry is not the oldest entry) or have an entry referencing a previous "released" record that is not reflected in the old timeline.

Related

Change Data Capture Using Spark SQL

I have few tables which are related as A -> Left Join -> B -> Left join -> C. Let's call A as the driving table and B & C as "supporting" tables. Each of these tables have a last_update_date column. My requirement is to identify the records that changed since the last processing date (available as a parameter) not only in the driving table but also if a change to any column occurs in the supporting table(s).
Table A
------
empid|salary|last_updt_dt
123|20000|05/14/2019
Table B
-------
empid|fname|lname|last_updt_date
123|John|Taylor|05/16/2019
Table C
-------
empid|address|last_updt_dt
123|Maryland|05/17/2019
Assume, = 05/10/2019
So, assuming executing job on Day 1 (05/20/2019) output should be:
empid|fname|lname|salary|address|last_exec_date
-----------------------------------------------
123|John|Taylor|20000|Maryland|05/20/2019
Now, let's assume that on Day 2 (05/21/2019), the address got changed from Maryland to California. So, on Day 2, the output table should look like:
empid|fname|lname|salary|address|last_exec_date
-----------------------------------------------
123|John|Taylor|20000|Maryland|05/20/2019
123|John|Taylor|20000|California|05/21/2019
561|Peter|Anderson|50000|Missouri|05/21/2019
The point to note is that on Day 2 a change in any "supporting table" (Table-C 'address' column in this case) triggered insertion of another record which was already processed earlier yesterday, but now with Updated value in address column. Also note, on Day 2 other inserts will happen as-is as regular inserts for any other qualifying record (if any) e.g. empid=561.
SELECT
A.empid, B.fname, B.lname, A.salary, C.address, current_date() as last_exec_date
from A
left outer join B
on A.empid = B.empid
left outer join B.empid = C.empid
where to_date(A.last_updt_dt, 'yyyyMMdd') > {last_exec_date}
OR to_date(A.last_updt_dt, 'yyyyMMdd') > {last_exec_date}
to_date(A.last_updt_dt, 'yyyyMMdd') > {last_exec_date}
My challenge is how to trigger and propagate any changes from any of the participating supporting tables, even when that change pertains to a record which had been processed and inserted to the target table earlier, so that a new record with the updated value shows in the target table.
In other word how can I trigger a record with a change from any of the other supporting (non-driver) tables

Find latest object in hierarchy where the latest isn't marked as deleted

I have two tables (of interest). The first table is a simple hierarchy. Each row has an ID, and a parent ID. Obviously, the parent ID can be NULL when we reach the top of that particular hierarchy. (We store multiple trees here, so there can be multiple NULL parents, but that's probably not important here.)
The second table has objects that have a non-unique identifier, for example a name, a timestamp, and a reference to the first table to indicate where on the hierarchy it sits.
Let's say the first table has a hierarchy of /A/B/C, and the second has a bunch of objects named "Foo". If I'm trying to get the latest Foo in /A/B, then I don't want to get anything from C. This seems straightforward enough. However, if the latest "Foo" in /A/B is marked in the database with a field saying it is deleted, e.g., status = 'deleted', I want to instead get the latest "Foo" in /A even if there are other "Foo" objects with earlier timestamps in /A/B.
Is this possible to do in a CTE? Or do I have to resort to a stored procedure to get this type of logic? I'm already using some stored procedures just for refactoring purposes, so that's not a barrier, but if I can do this in a simpler manner that I'm missing, that may be better (including for performance).
Since that's probably a bit vague, I put this on SQLFiddle. If I add in the override on line 24 of the schema, I should get that as the output. However, if I also add the deleted object in 26, I need to get back to the "update in /A" as the output.
I would extend your code with one intermediate step:
with recursive _rpath as (
select
0 as level,
id, parentid, name
from path
where id = 5 -- this would be filled in later
union all
select
child.level + 1 as level,
parent.id, parent.parentid, parent.name
from _rpath child
join path parent on child.parentid = parent.id
) , c AS (
select
rp, d, d.status
, ROW_NUMBER() OVER(PARTITION BY d.pathid ORDER BY d.creation DESC) AS rn
from data d
join _rpath rp
on rp.id = d.pathid
), datapaths AS (
SELECT *
FROM c
WHERE rn =1
AND status != 'deleted'
)
select dp.rp, dp.d
from datapaths dp
left join datapaths dpNext
on (dpNext.rp).level < (dp.rp).level or
((dpNext.rp).level = (dp.rp).level
and (dpNext.d).creation > (dp.d).creation)
where (dpNext.d).id is null;
DBFiddle Demo
How it works:
-- calculate node number for each pathid sort by creation descending
-- newest one gets always 1
ROW_NUMBER() OVER(PARTITION BY d.pathid ORDER BY d.creation DESC)
-- get only first for each pathid but omit if it is 'deleted'
WHERE rn =1
AND status != 'deleted'

TSQL - Insert values in table based on lookup table

I have a master table in sql server 2012 that consists of two types.
These are the critical columns I have in my master table:
TYPE CATEGORYGROUP CREDITORNAME DEBITORNAME
One of those two types (Type B) doesn't have a CategoryGroup assigned to it (so it's always null).
Type A always has a CategoryGroup and Debitor.
Type B always has a Creditor but no CategoryGroup.
For creditor and debitor I have two extra tables that also hold the CategoryGroup but for my task I only need the table for creditors since I already have the right value for type A (debitors).
So my goal is to look-up the CategoryGroup in the creditor table based on the creditor name and ideally put those CategoryGroup values in my master table.
With "put" I'm not sure if a view should be generated or actually put the data in the table which contains about 1.5 million records and keeps on growing.
There's also a "cluster table" that uses the CategoryGroup as a key field. But this isn't part of my problem here.
Please have a look at my sample fiddle
Hope you can help me.
Thank you.
If I understand you correctly, you can simply do a join to find the correct value, and update MainData with that value;
You can either use a common table expression...
WITH cte AS (
SELECT a.*, b.categorygroup cg
FROM MainData a
JOIN CreditorList b
ON a.creditorname = b.creditorname
)
UPDATE cte SET categorygroup=cg;
An SQLfiddle to test with.
...or an UPDATE/JOIN;
UPDATE m
SET m.categorygroup = c.categorygroup
FROM maindata m
JOIN creditorlist c
ON m.creditorname = c.creditorname;
Another SQLfiddle.
...and always remember to test before running potentially destructive SQL from random people on the Internet on your production data.
EDIT: To just see the date in the same format without doing the update, you can use;
SELECT
a.type, COALESCE(a.categorygroup, b.categorygroup) categorygroup,
a.creditorname, a.debitorname
FROM MainData a
LEFT JOIN CreditorList b
ON a.creditorname = b.creditorname
Yet another SQLfiddle.
Couldn't you just do:
update maindata
set categorygroup = (
select top 1 categorygroup
from creditorlist
where creditorname = maindata.creditorname)
where creditorname is not null
and categorygroup is null
?
Try this -
update m
set m.CategoryGroup = cl.CategoryGroup
-- select m.creditorName,
-- m.CategoryGroup as Dest,
-- cl.CategoryGroup as Src
from maindata as m
left join creditorlist as cl
on m.creditorName = cl.creditorName
where m.creditorName is not null
Before you update, you can check the results of the query by uncommenting the update statement and removing the updates.

Updating a table by referencing another table

I have a table CustPurchase (name, purchase) and another table CustID (id, name).
I altered the CustPurchase table to have an id field. Now, I want to populate this newly created field by referencing the customer ids from the CustID table, using:
UPDATE CustPurchase
SET CustPurchase.id = CustID.id
WHERE CustPurchase.name = CustID.name;
I keep getting syntax errors!
I believe you are after the useful UPDATE FROM syntax.
UPDATE CustPurchase SET id = CI.id
FROM
CustPurchase CP
inner join CustID CI on (CI.name = CP.name)
This might have to be the following:
UPDATE CustPurchase SET id = CI.id
FROM
CustID CI
WHERE
CI.name = CustPurchase.name
Sorry, I'm away from my Postgres machine; however, based upon the reference, it looks like this is allowable. The trouble is whether or not to include the source table in the from_list.
Joining by name is not an ideal choice, but this should work:
UPDATE custpurchase
SET id = (SELECT c.id
FROM CUSTID c
WHERE c.name = custpurchase.name)
The caveat is that if there's no match, the value attempting to be inserted would be NULL. Assuming the id column won't allow NULL but will allow duplicate values:
UPDATE custpurchase
SET id = (SELECT COALESCE(c.id, -99)
FROM CUSTID c
WHERE c.name = custpurchase.name)
COALESCE will return the first non-NULL value. Making this a value outside of what you'd normally expect will make it easier to isolate such records & deal with appropriately.
Otherwise, you'll have to do the updating "by hand", on a name by name basis, to correct instances that SQL could not.

SQL Query - Ensure a row exists for each value in ()

Currently struggling with finding a way to validate 2 tables (efficiently lots of rows for Table A)
I have two tables
Table A
ID
A
B
C
Table matched
ID Number
A 1
A 2
A 9
B 1
B 9
C 2
I am trying to write a SQL Server query that basically checks to make sure for every value in Table A there exists a row for a variable set of values ( 1, 2,9)
The example above is incorrect because t should have for every record in A a corresponding record in Table matched for each value (1,2,9). The end goal is:
Table matched
ID Number
A 1
A 2
A 9
B 1
B 2
B 9
C 1
C 2
C 9
I know its confusing, but in general for every X in ( some set ) there should be a corresponding record in Table matched. I have obviously simplified things.
Please let me know if you all need clarification.
Use:
SELECT a.id
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
WHERE b.number IN (1, 2, 9)
GROUP BY a.id
HAVING COUNT(DISTINCT b.number) = 3
The DISTINCT in the COUNT ensures that duplicates (IE: A having two records in TABLE_B with the value "2") from being falsely considered a correct record. It can be omitted if the number column either has a unique or primary key constraint on it.
The HAVING COUNT(...) must equal the number of values provided in the IN clause.
Create a temp table of values you want. You can do this dynamically if the values 1, 2 and 9 are in some table you can query from.
Then, SELECT FROM tempTable WHERE NOT IN (SELECT * FROM TableMatched)
I had this situation one time. My solution was as follows.
In addition to TableA and TableMatched, there was a table that defined the rows that should exist in TableMatched for each row in TableA. Let’s call it TableMatchedDomain.
The application then accessed TableMatched through a view that controlled the returned rows, like this:
create view TableMatchedView
select a.ID,
d.Number,
m.OtherValues
from TableA a
join TableMatchedDomain d
left join TableMatched m on m.ID = a.ID and m.Number = d.Number
This way, the rows returned were always correct. If there were missing rows from TableMatched, then the Numbers were still returned but with OtherValues as null. If there were extra values in TableMatched, then they were not returned at all, as though they didn't exist. By changing the rows in TableMatchedDomain, this behavior could be controlled very easily. If a value were removed TableMatchedDomain, then it would disappear from the view. If it were added back again in the future, then the corresponding OtherValues would appear again as they were before.
The reason I designed it this way was that I felt that establishing an invarient on the row configuration in TableMatched was too brittle and, even worse, introduced redundancy. So I removed the restriction from groups of rows (in TableMatched) and instead made the entire contents of another table (TableMatchedDomain) define the correct form of the data.