Most efficient way to select record if a value has changed - sql

What would be the most efficient way to select a record when one of the value has changed?
Ex:
I have an account history table like below where records are being created when the account change:
Id AcctNb Active Created
8 123456 1 01/03/2012
6 123456 0 01/01/2012
I like to find an efficient way to return the record where the active status has changed since the last entry.
UPDATE
The query I am using at the moment which works but inefficiently"
select d1.acctNb,d1.active, d2.active
from d044 d1 , d044 d2
where d1.created = '2012-04-14'
and d1.acctNb = d2.acctNb
and d2.created = (select max(d.created) from d044 d where d.acctNb = d2.acctNb and d.id != d1.id)
and (d1.active != d2.active)

Try this:
create table log
(
log_id int identity(1,1) primary key,
acct_id int not null,
active bit not null,
created datetime not null
);
insert into log(acct_id, active,created)
values
(1,1,'January 1, 2012'),
(1,1,'January 2, 2012'),
(1,0,'January 3, 2012'),
(1,0,'January 4, 2012'),
(1,1,'January 5, 2012'),
(2,0,'February 1, 2012'),
(2,1,'February 2, 2012'),
(2,0,'February 3, 2012'),
(2,1,'February 4, 2012'),
(2,1,'February 5, 2012');
The solution:
with serialize as
(
select row_number()
over(partition by acct_id order by created) rx,
*
from log
)
select ds.acct_id,
ds.active ds_active,
pr.active pr_active,
ds.created
from serialize ds -- detect second row
join serialize pr -- previous row
on pr.acct_id = ds.acct_id
and ds.rx = pr.rx + 1
where ds.rx >= 2 and
pr.active <> ds.active
Query output: January 3, January 5, February 2, February 3, February 4
Those are the dates when changes on active had occurred(detected)
Basically the logic is, starting from second row, we scan its previous row, if their active's value didn't match (via WHERE pr.active <> ds.active), we show them on results
Live test: http://sqlfiddle.com/#!3/68136/4

2 ways
1)
add a column
update_tsmp timestamp
put a trigger on the table that runs after update or insert
-- checks the Active field
-- if it has changed update update_tsmp to the current timestamp
now you must define "since the last entry" to determine whether you want to return the record
2)
create a history table
Id AcctNb Active Created change_tsmp updating_user delete_flag
put a trigger on the table that runs before update or delete
-- copies the record to the history table, checking the delete flag as appropriate

If future SQL Server will have LAG windowing function, you can simplify the comparison of previous row to current by using LAG
This works now on Postgresql(since 8.4), it already has LAG and LEAD windowing function:
create table log
(
log_id serial primary key,
acct_id int not null,
active boolean not null,
created timestamp not null
);
insert into log(acct_id, active,created)
values
(1,true,'January 1, 2012'),
(1,true,'January 2, 2012'),
(1,false,'January 3, 2012'),
(1,false,'January 4, 2012'),
(1,true,'January 5, 2012'),
(2,false,'February 1, 2012'),
(2,true,'February 2, 2012'),
(2,false,'February 3, 2012'),
(2,true,'February 4, 2012'),
(2,true,'February 5, 2012');
LAG approach is elegantly simpler than ROW_NUMBER and JOIN combo approach:
with merge_prev as
(
select
acct_id, created,
lag(active) over(partition by acct_id order by created) pr_active, -- previous row's active
active sr_active -- second row's active
from log
)
select *
from merge_prev
where
pr_active <> sr_active
Live test: http://sqlfiddle.com/#!1/b1eb0/25
EDIT
LAG is already available on SQL Server 2012: http://sqlfiddle.com/#!6/d17c0/1

Related

sql join using recursive cte

Edit: Added another case scenario in the notes and updated the sample attachment.
I am trying to write a sql to get an output attached with this question along with sample data.
There are two table, one with distinct ID's (pk) with their current flag.
another with Active ID (fk to the pk from the first table) and Inactive ID (fk to the pk from the first table)
Final output should return two columns, first column consist of all distinct ID's from the first table and second column should contain Active ID from the 2nd table.
Below is the sql:
IF OBJECT_ID('tempdb..#main') IS NOT NULL DROP TABLE #main;
IF OBJECT_ID('tempdb..#merges') IS NOT NULL DROP TABLE #merges
IF OBJECT_ID('tempdb..#final') IS NOT NULL DROP TABLE #final
SELECT DISTINCT id,
current
INTO #main
FROM tb_ID t1
--get list of all active_id and inactive_id
SELECT DISTINCT active_id,
inactive_id,
Update_dt
INTO #merges
FROM tb_merges
-- Combine where the id from the main table matched to the inactive_id (should return all the rows from #main)
SELECT id,
active_id AS merged_to_id
INTO #final
FROM (SELECT t1.*,
t2.active_id,
Update_dt ,
Row_number()
OVER (
partition BY id, active_id
ORDER BY Update_dt DESC) AS rn
FROM #main t1
LEFT JOIN #merges t2
ON t1.id = t2.inactive_id) t3
WHERE rn = 1
SELECT *
FROM #final
This sql partially works. It doesn't work, where the id was once active then gets inactive.
Please note:
the active ID should return the last most active ID
the ID which doesn't have any active ID should either be null or the ID itself
ID where the current = 0, in those cases active ID should be the ID current in tb_ID
ID's may get interchanged. For example there are two ID's 6 and 7, when 6 is active 7 is inactive and vice versa. the only way to know the most current active state is by the update date
Attached sample might be easy to understand
Looks like I might have to use recursive cte for achieiving the results. Can someone please help?
thank you for your time!
I think you're correct that a recursive CTE looks like a good solution for this. I'm not entirely certain that I've understood exactly what you're asking for, particularly with regard to the update_dt column, just because the data is a little abstract as-is, but I've taken a stab at it, and it does seem to work with your sample data. The comments explain what's going on.
declare #tb_id table (id bigint, [current] bit);
declare #tb_merges table (active_id bigint, inactive_id bigint, update_dt datetime2);
insert #tb_id values
-- Sample data from the question.
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 0),
-- A few additional data to illustrate a deeper search.
(6, 1),
(7, 1),
(8, 1),
(9, 1),
(10, 1);
insert #tb_merges values
-- Sample data from the question.
(3, 1, '2017-01-11T13:09:00'),
(1, 2, '2017-01-11T13:07:00'),
(5, 4, '2013-12-31T14:37:00'),
(4, 5, '2013-01-18T15:43:00'),
-- A few additional data to illustrate a deeper search.
(6, 7, getdate()),
(7, 8, getdate()),
(8, 9, getdate()),
(9, 10, getdate());
if object_id('tempdb..#ValidMerge') is not null
drop table #ValidMerge;
-- Get the subset of merge records whose active_id identifies a "current" id and
-- rank by date so we can consider only the latest merge record for each active_id.
with ValidMergeCTE as
(
select
M.active_id,
M.inactive_id,
[Priority] = row_number() over (partition by M.active_id order by M.update_dt desc)
from
#tb_merges M
inner join #tb_id I on M.active_id = I.id
where
I.[current] = 1
)
select
active_id,
inactive_id
into
#ValidMerge
from
ValidMergeCTE
where
[Priority] = 1;
-- Here's the recursive CTE, which draws on the subset of merges identified above.
with SearchCTE as
(
-- Base case: any record whose active_id is not used as an inactive_id is an endpoint.
select
M.active_id,
M.inactive_id,
Depth = 0
from
#ValidMerge M
where
not exists (select 1 from #ValidMerge M2 where M.active_id = M2.inactive_id)
-- Recursive case: look for records whose active_id matches the inactive_id of a previously
-- identified record.
union all
select
S.active_id,
M.inactive_id,
Depth = S.Depth + 1
from
#ValidMerge M
inner join SearchCTE S on M.active_id = S.inactive_id
)
select
I.id,
S.active_id
from
#tb_id I
left join SearchCTE S on I.id = S.inactive_id;
Results:
id active_id
------------------
1 3
2 3
3 NULL
4 NULL
5 4
6 NULL
7 6
8 6
9 6
10 6

How would you find the 'GOOD' ID when cancellation is involved?

Suppose you have the following schema:
CREATE TABLE Data
(
ID INT,
CXL INT
)
INSERT INTO Data (ID, CXL)
SELECT 1, NULL
UNION
SELECT 2, 1
UNION
SELECT 3, 2
UNION
SELECT 5, 3
UNION
SELECT 6, NULL
UNION
SELECT 7, NULL
UNION
SELECT 8, 7
The column CXL is the ID that cancels a particular ID. So, for example, the first row in the table with ID:1 was good until it was cancelled by ID:2 (CXL column). ID:2 was good until it was cancelled by ID:3. ID:3 was good until it was cancelled by ID:5 so in this sequence the last "GOOD" ID was ID:5.
I would like to find all the "GOOD" IDs So in this example it would be:
Latest GOOD ID
5
6
8
Here's a fiddle if you want to play with this:
http://sqlfiddle.com/#!6/68ac48/1
SELECT D.ID
FROM Data D
WHERE NOT EXISTS(SELECT 1
FROM Data WHERE D.ID = CXL)
select Id
from data
where Id not in (select cxl from data where cxl is not null)

Creating a dynamic Pivot table in Oracle SQL

I currently have a static pivot query that is working.
WITH ROW_SET AS
(
SELECT DEFKEY.CF_DEFERRAL_DURATION, KEYACM.CF_KEYWORD_ID, KEYACM.CF_PERIOD, KEYACM.CF_VALUE,
row_number () OVER ( PARTITION BY KEYACM.CF_KEYWORD_ID
ORDER BY KEYACM.CF_PERIOD desc
) AS r_num
FROM AMEXIV.MAS_CFUS_KEYACM_CONTROLDATA KEYACM
JOIN AMEXIV.MAS_CFUS_DEFKEY_CONTROLDATA DEFKEY
ON DEFKEY.CF_MODEL_ID = KEYACM.CF_MODEL_ID
AND DEFKEY.CF_KEYWORD_ID = CONCAT(KEYACM.CF_KEYWORD_ID, '_accum_deferral')
)
SELECT *
FROM ROW_SET
PIVOT ( MIN (ROW_SET.CF_VALUE) AS VALUE
, MIN (ROW_SET.CF_PERIOD) AS PERIOD
FOR r_num IN ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
) KEYACM
I'd like to convert the line FOR r_num IN (1, 2, 3...etc) so that it instead uses the value from the field DEFKEY.CF_DEFERRAL_DURATION. That field is the number of months that the calculation should happen and it currently will do the calculation correctly only for records that have a Deferral Duration of 12 months. But there are others that have 6 months.
Is there a way to read this value from the DEFKEY.CF_DEFERRAL_DURATION field?

Query for historical data

I'm trying to make a query in a postgresql database but I can't figure out how the way to make it. I have a historical table which stores the status of an object, the date and other data. Something like this:
id objectid date status ....
----------------------------
9 12 date1 2
8 12 date2 2
7 12 date3 2 <-- This is the date where the last status was set
6 12 date4 1
5 12 date5 1
4 12 date6 6
3 12 date7 6
2 12 date8 2
1 12 date9 2
I need to get the date where the current status (the last one set for each object) has been set for all the objects in the system (objectid). So in the example (I have only included info for the object 12 to symplify) if they are ordered chronologically (date9 is the oldest and date1 is the earliest) the current status is 2 and I want to get date3 which is hen this status was set for the last time. Notice that status 2 was set earlier, but then it changed to 6, 1 and then to 2 again. I want to get the first date in the current interval.
Can anyone tell me how to construct this query or the way to go?
Thanks.
UPDATE query accoring to #Unreason answer so it could be joined to the table which contains the object which objectid references to
SELECT objectid, max(date)
FROM (
SELECT objectid, status, date, lag(status) OVER window1, last_value(status) OVER window1 last_status
FROM historical
WINDOW window1 AS ( PARTITION BY objectid ORDER BY date)
) x
WHERE (status <> lag OR lag IS NULL)
AND status = last_status
GROUP BY objectid
There are many ways to do this, windowing functions come to mind
Something like
SELECT max(date)
FROM (
SELECT status, date, lag(status) OVER window1, last_value(status) OVER window1 last_status
FROM historical
WHERE objectid = 12
WINDOW window1 AS ( ORDER BY date)
) x
WHERE (status <> lag OR lag IS NULL)
AND status = last_status;
Notes:
using keywords as field names (such as lag and date) should be avoided
there are many other ways to write this query
currently it works for one object (objectid = 12), but it could be modified to return the date of last status for each object
EDIT
Test data
CREATE TABLE historical (
id integer,
objectid integer,
date date,
status integer
);
INSERT INTO historical VALUES (1, 12, '2000-01-01', 2);
INSERT INTO historical VALUES (2, 12, '2001-01-01', 2);
INSERT INTO historical VALUES (3, 12, '2002-01-01', 6);
INSERT INTO historical VALUES (4, 12, '2003-01-01', 6);
INSERT INTO historical VALUES (5, 12, '2004-01-01', 1);
INSERT INTO historical VALUES (6, 12, '2005-01-01', 1);
INSERT INTO historical VALUES (7, 12, '2006-01-01', 2);
INSERT INTO historical VALUES (8, 12, '2007-01-01', 2);
INSERT INTO historical VALUES (9, 12, '2008-01-01', 2);
In future you might want to post results of your
pg_dump -t table_name --inserts
so it is easier to setup a test case
select min(date)
from historical
where status = 2
and objectid = 12
and date >
(select max(date)
from historical
where status <> 2)

In MYSQL, how can I select multiple rows and have them returned in the order I specified?

I know I can select multiple rows like this:
select * FROM table WHERE id in (1, 2, 3, 10, 100);
And I get the results returned in order: 1, 2, 3, 10, 100
But, what if I need to have the results returned in a specific order? When I try this:
select * FROM table WHERE id in (2, 100, 3, 1, 10);
I still get the results returned in the same order: 1, 2, 3, 10, 100
Is there a way to get the results returned in the exact order that I ask for?
(There are limitations due to the way the site is set up that won't allow me to ORDER BY using another field value)
the way you worded that I'm not sure if using ORDER BY is completely impossible or just ordering by some other field... so at the risk of submitting a useless answer, this is how you'd typically order your results in such a situation.
SELECT *
FROM table
WHERE id in (2, 100, 3, 1, 10)
ORDER BY FIELD (id, 2, 100, 3, 1, 10)
Unless you are able to do ORDER BY, there is no guaranteed way.
The sort you are getting is due to the way MySQL executes the query: it combines all range scans over the ranges defined by the IN list into a single range scan.
Usually, you force the order using one of these ways:
Create a temporary table with the value and the sorter, fill it with your values and order by the sorter:
CREATE TABLE t_values (value INT NOT NULL PRIMARY KEY, sorter INT NOT NULL)
INSERT
INTO t_values
VALUES
(2, 1),
(100, 1),
(3, 1),
(1, 1),
(10, 1);
SELECT m.*
FROM t_values v
JOIN mytable m
ON m.id = v.value
ORDER BY
sorter
Do the same with an in-place rowset:
SELECT m.*
FROM (
SELECT 2 AS value, 1 AS sorter
UNION ALL
SELECT 100 AS value, 2 AS sorter
UNION ALL
SELECT 3 AS value, 3 AS sorter
UNION ALL
SELECT 1 AS value, 4 AS sorter
UNION ALL
SELECT 10 AS value, 5 AS sorter
)
JOIN mytable m
ON m.id = v.value
ORDER BY
sorter
Use CASE clause:
SELECT *
FROM mytable m
WHERE id IN (1, 2, 3, 10, 100)
ORDER BY
CASE id
WHEN 2 THEN 1
WHEN 100 THEN 2
WHEN 3 THEN 3
WHEN 1 THEN 4
WHEN 10 THEN 5
END
You can impose an order, but only based on the value(s) of one or more columns.
To get the rows back in the order you specify in the example you would need to add a second column, called a "sortkey" whose values can be used to sort the rows in the desired sequence,
using the ORDER BY clause. In your example:
Value Sortkey
----- -------
1 4
2 1
3 3
10 5
100 2
select value FROM table where ... order by sortkey;