Finding updates in a table using Self-Join - sql

I have a table as shown below
tablename - property
|runId|listingId|listingName
1 123 abc
1 234 def
2 123 abcd
2 567 ghi
2 234 defg
As you can see in above code there is a runId and there is a listing Id. I am trying to fetch for a particular runId which are the new listings added (In this case for runId 2 its 4th row with listing id 567 ) and which are the listing Ids that are update (In this case its row 3 and row 5 with listingId 123 and 234 respectively)
I am trying self join and it is working fairly for new updates but new additions are giving me trouble
SELECT p1.* FROM property p1
INNER JOIN property p2
ON p1.listingid = p2.listingid
WHERE p1.runid=456 AND p2.runid!=456
The above query provides me correct updated records in the table. But I am not able to find new listing. I used p1.listingid != p2.listingId , left outer join, still wont work.

I would use the ROW_NUMBER() analytical function for it.
SELECT
T.*
FROM
(
SELECT
T.*,
CASE
WHEN ROW_NUMBER() OVER(
PARTITION BY LISTINGID
ORDER BY
RUNID
) = 1 THEN 'INSERTED'
ELSE 'UPDATED'
END AS OPERATION_
FROM
PROPERTY
)
WHERE
RUNID = 2
-- AND OPERATION_ = 'INSERTED'
-- AND OPERATION_ = 'UPDATED'
This will provide the result as updated if listingid is added in any of the previous runid
Cheers!!

You may try this.
with cte as (
select row_number() over (partition by listingId order by runId) as Slno, * from property
)
select * from property where listingId not in (
select listingId from cte as c where slno>1
) --- for new listing added
with cte as (
select row_number() over (partition by listingId order by runId) as Slno, * from property
)
select * from property where listingId in (
select listingId from cte as c where slno>1
) --- for modified listing

For this, I would recommend exists and not exists. For updates:
select p.*
from property p
where exists (select 1
from property p2
where p2.listingid = p.listingid and
p2.runid < p.runid
);
If you want the result for a particular runid, add and runid = ? to the outer query.
And for new listings:
select p.*
from property p
where not exists (select 1
from property p2
where p2.listingid = p.listingid and
p2.runid < p.runid
);
With an index on property(listingid, runid), I would expect this to have somewhat better performance than a solution using window functions.
Here is a db<>fiddle.

Related

Mapping All Terminal IDs to Previous IDs

I have a table in SQL Server that contains a list of all ID migrations overtime. An individual's ID can change overtime, and this table helps us understand when the change occurs, and what the ID changes from/to. What I'd ultimately like is a way to list all of the previous IDs for the most recent ID (which I'm referring to as the terminal ID). I'm assuming this will require some sort of CTE, but my brain is in a bit of a fog as to how I should set this up.
CREATE TABLE #ExampleIdCrosswalk
(
CurrentId VARCHAR(3)
,PreviousId VARCHAR(3)
,PreviousIdObsoleteDate DATE
)
INSERT INTO #ExampleIdCrosswalk
VALUES
('DEF','ABC','2021-01-01')
,('WVU','ZYX','2021-01-01')
,('MNO','ONM','2021-02-01')
,('PPP','EEE','2021-02-01')
,('GHI','DEF','2021-03-01')
,('TSR','WVU','2021-03-01')
,('NRP','QRS','2021-03-01')
,('JKL','GHI','2021-04-01')
SELECT * FROM #ExampleIdCrosswalk
Ultimately, what I'd like to show is a table with all the terminal ID's along with each of their corresponding previous IDs.
Any help would be appreciated!
You can use a recursive CTE for this:
with cte as (
select currentid, previousid
from ExampleIdCrosswalk ec
where not exists (select 1 from ExampleIdCrosswalk ec2 where ec2.previousId = ec.currentid)
union all
select cte.currentid, ec.previousid
from cte join
ExampleIdCrosswalk ec
on ec.currentId = cte.previousId
)
select *
from cte;
Here is a db<>fiddle.
You can use a recursive CTE, as in:
with
n (last, curr, prev) as (
select currentid, currentid, previousid
from ExampleIdCrosswalk where currentid not in (
select previousid from ExampleIdCrosswalk
)
union all
select n.last, c.currentid, c.previousid
from n
join ExampleIdCrosswalk c on c.currentid = n.prev
)
select last, prev
from n
order by last, prev
Result:
last prev
----- ----
JKL ABC
JKL DEF
JKL GHI
MNO ONM
NRP QRS
PPP EEE
TSR WVU
TSR ZYX
See running example at db<>fiddle.

SQL Server : update multiple rows one by one while incrementing id

I am pretty new to SQL and I thought I was comfortable using it after a while but it still is tough. I am trying to increment ids. I know I could use auto-increment but in this case there are id has relationship with several categories so it has to start with different numbers so I can't do it.
The table looks something like this:
id category
----------------
1000 1
1000 1
...
2000 2
2000 2
...
And I want to make it:
id category
------------------
1000 1
1001 1
1002 1
...
2000 2
2001 2
...
I tried:
UPDATE T1
SET id = CASE
WHEN EXISTS (SELECT id FROM STYLE WHERE T1.id = id)
THEN (SELECT MAX(CAST(id AS INT)) + 1
FROM STYLE
WHERE category = T1.category)
END
FROM STYLE T1
WHERE idStyle = idStyle
But it just added 1 to all rows. How could I go 1 by 1 so it could actually get the incremented max id? Thank you.
In the absense of real sample data, this is a pseudo-sql, however, something like...
UPDATE YT
----SELECT NULL as Ihave no context of other fields in your table
SET id = id + ROW_NUMBER() OVER (PARTITION BY category ORDER BY (SELECT NULL)) - 1
FROM YourTable YT;
You can use row_number() function instead :
select *,
concat(cid, row_number() over (partition by id order by category)-1) as NewId
from style s;

SQL - for each entry in a table - check for associated row

I have a log table which logs a start row, and a finish row for a particular event.
Each event should have a start row, and if everything goes ok it should have an end row.
But if something goes wrong then the end row may not be created.
I want to SELECT everything in the table that has a start row but not an associated end row.
For example, consider the table like this:
id event_id event_status
1 123 1
2 123 2
3 234 1
4 234 2
5 456 1
6 678 1
7 678 2
Notice that the id column 5 has a start row but no end row. Start is an event_status of 1, end is an event_status of 2.
How can i pull back all the event_ids which have a start row but not an end row>?
This is for mssql.
You could use a not exists subquery to demand that no other row exists that ends the event:
select *
from YourTable t1
where status = 1
and not exists
(
select *
from YourTable t2
where t2.event_id = t1.event_id
and t2.status = 2
)
You can try with left self join as below:
select y1.event_id from #yourevents y1 left join #yourevents y2
on y1.event_id = y2.event_id
and y1.event_status = 1
and y2.event_status = 2
where y2.event_id is null
and y1.event_status = 1
In this particular case you could use one of 3 solutions:
Solution 1. The classic
Check if there is no end status
SELECT *
FROM myTable t1
WHERE NOT EXISTS (
SELECT *
FROM myTable t2
WHERE t1.event_id = t2.event_id AND t2.status=2
)
Solution 2. Make it pretty. Don't do subqueries with so many parentheses
The same check, but in a more concise and pretty manner
SELECT t1.*
FROM myTable t1
LEFT JOIN myTable t2 ON t1.event_id = t2.event_id AND t2.status=2
-- Doesn't exist
WHERE t2.event_id IS NULL
Solution 3. Look for the last status for each event
More flexibility in case the status logic becomes more complicated
WITH last_status AS (
SELECT
id,
event_id,
status,
-- The ROWS BETWEEN ..yadda yadda ... FOLLOWING might be unnecessary. Try, check.
last_value(status) OVER (PARTITION BY event_id ORDER BY status ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_status
FROM myTable
)
SELECT
id,
event_id,
status
FROM last_events
WHERE last_status<>2
There are more, with min/max queries and others. Pick what best suits your need for cleanliness, readability and versatility.

Column Operations and eliminating the duplicate

I have this particular table below . I want to eliminate duplicate Course from the group 2, cause it is in group 1. Basically if the course is mapped on to Group 1 which is mandatory, we have to only consider that and not in any other group. I will have to check repeating courses first and then remove the duplicate course which is not mandatory.
Program Group Course Mandatory
Program1 1 a YES
Program1 1 b YES
Program1 1 c YES
Program1 2 d NO
Program1 2 a NO
Program1 2 e NO
Program1 3 f YES
I am not able to figure out same column operations , or my mind is not working today(:-) )
I have tried using Count Operation and creating a flag for duplicate rows ,but cannot do it with 'Group' in the group by clause.
Output:
Program Group Course Mandatory
Program1 1 a YES
Program1 1 b YES
Program1 1 c YES
Program1 2 d NO
Program1 2 e NO
Program1 3 f YES
EDIT
How can we
Check for duplicate records and delete it from only one particular group.
You can use a ROW_NUMBER() function to achieve this:
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY COURSE ORDER BY [Group] ) as RowRank
FROM table
)sub
WHERE RowRank = 1
Demo: SQL Fiddle
Edit: ROW_NUMBER assigns a number to each row. Numbering will start at 1 for each grouping you assign via the PARTITION BY portion, in this case each COURSE would have a number 1 and go up. The order of the numbers is determined by the ORDER BY portion, in this case the lowest [Group] gets the 1.
Editing my answer to reflect clarified requirements.
select *
from #TableName t
where
(Mandatory = 'YES' or
not exists (
select *
from #TableName
where
Program = t.Program and
Course = t.Course and
[Group] != t.[Group] and
Mandatory = 'YES'
))
based on your comments below here's another sample to try
;with group1 as (
select * from #Table where [Group] = 1
),
groups12 as (
select * from group1
union all
select * from #Table t where [Group] = 2 and not exists (select * from group1 where Program = t.Program and Course = t.Course)
),
groups123 as (
select * from groups12
union all
select * from #Table t where [Group] = 3 and not exists (select * from groups12 where Program = t.Program and Course = t.Course)
),
groups1234 as (
select * from groups123
union all
select * from #Table t where [Group] = 4 and not exists (select * from groups123 where Program = t.Program and Course = t.Course)
)
select * from groups1234
This query pulls rows for groups 1-4 in order and only when the Program/Course hasn't already appeared in a lower number group.

Repeating Subquery in DB2 update

I am currently creating an update statement which will update a bitemporal table. It does the following:
Update every Row for every ID and set the RELATION_ID to the RELATION_ID of the newest row.
In this query, there is a repeating subquery ( Because I first use it to get the value used to update) and also ( I don't want to update rows which already have this RELATION_ID)
Is there a way to reuse the value from the first query or any alternatives?(Without programming, pure SQL required)
UPDATE TBL_CLIENT UPD
SET RELATION_ID = (
SELECT RELATION_ID FROM TBL_CLIENT SUBQ
WHERE UPD.ID = SUBQ.ID AND
UPDATE_TIMESTAMP = (
SELECT MAX(UPDATE_TIMESTAMP) FROM TBL_CLIENT SUBSQ
WHERE SUBSQ.ID = SUBQ.ID
)
)
WHERE ID IN (
SELECT ID
FROM TBL_CLIENT QU
GROUP BY ID
HAVING COUNT(DISTINCT(RELATION_ID)) > 1
) AND
RELATION_ID <> (
SELECT RELATION_ID FROM TBL_CLIENT SUBQ2
WHERE UPD.ID = SUBQ2.ID AND
UPDATE_TIMESTAMP = (
-- Update mit STID des neusten Eintrages
SELECT MAX(UPDATE_TIMESTAMP) FROM TBL_CLIENT SUBSQ2
WHERE SUBSQ2.ID = SUBQ2.ID
)
)
Example:
ID / RELATION_ID / UPDATE_TIMESTAMP
1 / 10 / 1.1.2000
1 / 12 / 5.1.2002
1 / 15 / 28.3.2008
After Update:
1 / 15 / 1.1.2000
1 / 15 / 5.1.2002
1 / 15 / 28.3.2008
The last row was the most recent row, therefore it's relation id was taken (and the row itself wasn't updated!, important for an other part of the query which isn't included here)
Thanks for any recommendations
You can update a view:
UPDATE
( SELECT t.id, t.update_timestamp, ... --- all the PK columns
t.relation_id,
m.relation_id AS new_relation_id
FROM
TBL_CLIENT AS t
JOIN
( SELECT id, relation_id,
ROW_NUMBER() OVER (PARTITION BY id
ORDER BY update_timestamp DESC)
AS rn
FROM TBL_CLIENT
) AS m
ON m.id = t.id
WHERE m.rn = 1
AND m.relation_id <> t.relation_id
) AS upd
SET
relation_id = new_relation_id ;
This might work for you-(I dont know the exact syntax. i work on sybase, so this is as per sybase)
UPDATE TBL_CLIENT AA
SET RELATION_ID = BB.RELATION_ID
FROM
TBL_CLIENT AA,
TBL_CLIENT BB
WHERE
AA.ID=BB.ID
AND BB.UPDATE_TIMESTAMP=(SELECT MAX(UPDATE_TIMESTAMP) FROM TBL_CLIENT CC WHERE CC.ID=BB.ID)
Probably you can use a trigger with an after/before update.