Oracle get row where column value changed - sql

Say I have a table, something like
ID CCTR DATE
----- ------ ----------
1 2C 8/1/2018
2 2C 7/2/2018
3 2C 5/4/2017
4 2B 3/2/2017
5 2B 1/1/2017
6 UC 11/23/2016
There are other fields, but I made it simple. So I create a query where i have the date in descending order. I was to return the row where there was a change in CCTR. So in this case it would return ID 4. Basically i want to find the previous value of CCTR before it changed, in this case from 2B to 2C.
How do I do this? Ive tried to google it, but can't seem to find the right method.

You can use the LAG() window function to peek at the previous row and compare it. If your data is:
create table t2 (
id number(6),
cctr varchar2(10),
date1 date
);
insert into t2 (id, cctr, date1) values (1, '2C', date '2018-08-01');
insert into t2 (id, cctr, date1) values (2, '2C', date '2018-07-02');
insert into t2 (id, cctr, date1) values (3, '2C', date '2017-05-04');
insert into t2 (id, cctr, date1) values (4, '2B', date '2017-03-02');
insert into t2 (id, cctr, date1) values (5, '2B', date '2017-01-01');
insert into t2 (id, cctr, date1) values (6, 'UC', date '2016-11-23');
Then the query would be:
select * from t2 where date1 = (
select max(date1)
from (
select
id, date1, cctr, lag(cctr) over(order by date1 desc) as prev
from t2
) x
where prev is not null and cctr <> prev
);
Result:
ID CCTR DATE1
------- ---------- -------------------
4 2B 2017-03-02 00:00:00

You may use first_value analytic function to detect the changes in CCTR column :
select fv as value, cctr
from
(
with t(ID,CCTR) as
(
select 1,'2C' from dual union all
select 2,'2C' from dual union all
select 3,'2C' from dual union all
select 4,'2B' from dual union all
select 5,'2B' from dual union all
select 6,'UC' from dual
)
select id, cctr, first_value(id) over (partition by cctr order by id ) fv
from t
order by id
)
where id = fv;
VALUE CCTR
----- ----
1 2C
4 2B
6 UC
Rextester Demo

Related

GROUP by Largest String for all the substrings

I have a table like this where some rows have the same grp but different names. I want to group them by name such that all the substrings after removing nonalphanumeric characters are aggregated together and grouped by the largest string. The null value is considered the substring of all the strings.
grp
name
value
1
ab&c
10
1
abc d e
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
Desired result
grp
name
value
1
abcde
111
1
xy
34
2
fgh
87
My query-
Select grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g') name, sum(value) value
from table
group by grp,
regexp_replace(name,'[^a-zA-Z0-9]+', '', 'g');
Result
grp
name
value
1
abc
10
1
abcde
56
1
ab
21
1
a
23
1
xy
34
1
[null]
1
2
fgh
87
What changes should I make in my query?
To solve this problem, I did the following (all of the code below is available on the fiddle here).
CREATE TABLE test
(
grp SMALLINT NOT NULL,
name TEXT NULL,
value SMALLINT NOT NULL
);
and populate it using your data + extra for testing:
INSERT INTO test VALUES
(1, 'ab&c', 10),
(1, 'abc d e', 56),
(1, 'ab', 21),
(1, 'a', 23),
(1, NULL, 1000000),
(1, 'r*&%$s', 100), -- added for testing.
(1, 'rs__t', 101),
(1, 'rs__tu', 101),
(1, 'xy', 1111),
(1, NULL, 1000000),
(2, 'fgh', 87),
(2, 'fgh', 13), -- For Charlieface
(2, NULL, 1000000),
(2, 'x', 50),
(2, 'x', 150),
(2, 'x----y', 100);
Then, you can use this query:
WITH t1 AS
(
SELECT
grp, n_str,
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str),
CASE
WHEN
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str) IS NULL
OR
POSITION
(
LAG(n_str) OVER (PARTITION BY grp ORDER BY grp, n_str)
IN
n_str
) = 0
THEN 1
ELSE 0
END AS change,
value
FROM
test t1
CROSS JOIN LATERAL
(
VALUES
(
REGEXP_REPLACE(name,'[^a-zA-Z0-9]+', '', 'g')
)
) AS v(n_str)
WHERE n_str IS NOT NULL
), t2 AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY grp, s_change ORDER BY grp, n_str DESC) AS rn,
grp, n_str,
SUM(value) OVER (PARTITION BY grp, s_change) AS s_val,
MAX(LENGTH(n_str)) OVER (PARTITION BY grp) AS max_nom
FROM
(
SELECT
grp, n_str, change,
SUM(change) OVER (ORDER BY grp, n_str) AS s_change,
value
FROM
t1
ORDER BY grp, n_str DESC
) AS sub1
), t3 AS
(
SELECT
grp, SUM(value) AS null_sum
FROM
test
WHERE name IS NULL
GROUP BY grp
)
SELECT x.grp, x.n_str, x.s_val + y.null_sum
FROM t2 x
JOIN t3 y
ON x.max_nom = LENGTH(x.n_str) AND x.grp = y.grp
UNION
SELECT grp, n_str, s_val
FROM
t2 WHERE max_nom != LENGTH(n_str) AND rn = 1
ORDER BY grp, n_str;
Result:
grp n_str ?column?
1 abcde 2000110
1 rstu 302
1 xy 1111
2 fgh 1000100
2 xy 300
A few points to note:
Please always provide a fiddle when you ask questions such as this one with tables and data - it provides a single source of truth for the question and eliminates duplication of effort on the part of those trying to help you!
You haven't been very clear about what, exactly, should happen with NULLs - do the values count towards the SUM()? You can vary the CASE statement as required.
What happens when there's a tie in the number of characters in the string? I've included an example in the fiddle, where you get the draws - but you may wish to sort alphabetically (or some other method)?
There appears to be an error in your provided sums for the values (even taking account of counting or not values for NULL for the name field).
Finally, you don't want to GROUP BY the largest string - you want to GROUP BY the grp fields + the SUM() of the values in the the given grp records and then pick out the longest alphanumeric string in that grouping. It would be interesting to know why you want to do this?

how to count the number of last day

I got a data like it :
id date_ type
1 05/03/2020 A
2 07/03/2020 A
3 15/03/2020 A
4 25/03/2020 B
5 24/03/2020 B
6 31/03/2020 C
7 31/03/2020 D
I used the function last_day,
I did it :
select last_day(date_) from table1
But I got it :
31/03/2020 : 7
And I want to have it :
31/03/2020 : 2
thanks !
If you are looking for the count of records having last day of the month in date_ field then:
Schema and insert statements:
create table table1(id int, date_ date, type varchar(10));
insert into table1 values(1, '05-Mar-2020', 'A');
insert into table1 values(2, '07-Mar-2020', 'A');
insert into table1 values(3, '15-Mar-2020', 'A');
insert into table1 values(4, '25-Mar-2020', 'B');
insert into table1 values(5, '24-Mar-2020', 'B');
insert into table1 values(6, '31-Mar-2020', 'C');
insert into table1 values(7, '31-Mar-2020', 'D');
Query:
select date_, count(*)cnt
from table1
where date_ = last_day(date_)
group by date_;
Ouput:
DATE_
CNT
31-MAR-20
2
If you need all the date_ with count no need to use last_day:
Query:
select date_, count(*)cnt
from table1
group by date_
order by date_;
Output:
DATE_
CNT
05-MAR-20
1
07-MAR-20
1
15-MAR-20
1
24-MAR-20
1
25-MAR-20
1
31-MAR-20
2
db<>fiddle here
I think you want aggregation:
select date_, count(*)
from t
where date_ = last_day(date_)
group by date_;
The way I understood it, "last day" isn't the result of the LAST_DAY function, but maximum date value in that table. The result you're after is count of rows whose date is equal to that "maximum" date.
If that's so, then this might be one option: counting rows is easy. ROW_NUMBER analytic function calculates ordinal numbers of each row, sorted by date in descending order which means that it is the 1st row you need.
Something like this:
SQL> select date_, cnt
2 from (select date_,
3 count(*) cnt,
4 row_number() over (order by date_ desc) rn
5 from table1
6 group by date_
7 )
8 where rn = 1;
DATE_ CNT
---------- ----------
31/03/2020 2
SQL>

SQL query for date intervals comparing non-adjacent rows?

I want to flag the first date in every window of at least 31 days for each ID unit in my data.
ROW ID INDEX_DATE
1 ABC 1/1/2019
2 ABC 1/7/2019
3 ABC 1/21/2019
4 ABC 2/2/2019
5 ABC 2/9/2019
6 ABC 3/6/2019
7 DEF 1/5/2019
8 DEF 2/1/2019
9 DEF 2/8/2019
The desired rows are 1, 4, 6, 7 and 9; these are either the first INDEX_DATE for the given ID, or they occur at least 31 days after the previously flagged INDEX_DATE. Every suggestion I have found uses LAG() or LEAD with window functions, but I could only get these to compare adjacent rows. Row 4, for example, needs to be compared to Row 1 in order to be identified as the first after a 31-day window has completed.
I tried the following:
Data
DROP TABLE tTest IF EXISTS;
CREATE TEMP TABLE tTest
(
ROWN INT,
ID VARCHAR(3),
INDEX_DATE DATE
) ;
GO
INSERT INTO tTEST VALUES (1, 'ABC', '1/1/2019');
INSERT INTO tTEST VALUES (2, 'ABC', '1/7/2019');
INSERT INTO tTEST VALUES (3, 'ABC', '1/21/2019');
INSERT INTO tTEST VALUES (4, 'ABC', '2/2/2019');
INSERT INTO tTEST VALUES (5, 'ABC', '2/9/2019');
INSERT INTO tTEST VALUES (6, 'ABC', '3/6/2019');
INSERT INTO tTEST VALUES (7, 'DEF', '1/5/2019');
INSERT INTO tTEST VALUES (8, 'DEF', '2/1/2019');
INSERT INTO tTEST VALUES (9, 'DEF', '2/8/2019');
GO
Query:
DROP TABLE TTEST2 IF EXISTS;
CREATE TEMP TABLE TTEST2 AS (
WITH
RN_CTE(ROWN, ID, INDEX_DATE, RN) AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY INDEX_DATE)
FROM tTEST),
MIN_CTE(ROWN, ID, INDEX_DATE, RN) AS (SELECT * FROM RN_CTE WHERE RN=1),
DIFF_CTE(ROWN,ID, INDEX_DATE, RN, DAY_DIFF) AS (
SELECT RN.*, DATE(RN.INDEX_DATE + INTERVAL '30 DAYS')
FROM RN_CTE AS RN
JOIN MIN_CTE AS MC ON RN.ID=MC.ID
WHERE RN.RN=1
OR RN.INDEX_DATE > MC.INDEX_DATE + INTERVAL '30 DAYS' ),
MIN_DIFF_CTE AS (
SELECT ID, DAY_DIFF, MIN(ROWN) AS MIN_ROW
FROM DIFF_CTE
GROUP BY ID, DAY_DIFF)
SELECT T.*
FROM MIN_DIFF_CTE AS MDC
JOIN tTEST AS T ON MDC.MIN_ROW = T.ROWN
ORDER BY ID, INDEX_DATE
);
Result:
SELECT * FROM TTEST2 ORDER BY ID, INDEX_DATE;
ROWN ID INDEX_DATE
1 ABC 2019-01-01
4 ABC 2019-02-02
5 ABC 2019-02-09
6 ABC 2019-03-06
7 DEF 2019-01-05
9 DEF 2019-02-08
Row 5 with INDEX_DATE = 2019-02-09 should not be in the output because it is less than 31 days after Row 4's INDEX_DATE.
Something like this. The CTE's locate the unique window of the minimum ROW value for each ID transition and 31 days rolling too.
Data
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
(1, 'abc', '1/1/2019'),
(2, 'abc', '1/7/2019'),
(3, 'abc', '1/21/2019'),
(4, 'abc', '2/2/2019'),
(5, 'abc', '2/9/2019'),
(6, 'abc', '3/6/2019'),
(7, 'def', '1/5/2019'),
(8, 'def', '2/1/2019'),
(9, 'def', '2/8/2019')) V([ROW], ID, INDEX_DATE);
Query
;with
rn_cte([ROW], ID, INDEX_DATE, rn) as (
select *, row_number() over (partition by ID order by INDEX_DATE)
from #tTEST),
min_cte([ROW], ID, INDEX_DATE, rn) as (select * from rn_cte where rn=1),
diff_cte([ROW], ID, INDEX_DATE, rn, day_diff) as (
select rn.*, datediff(d, mc.INDEX_DATE, rn.INDEX_DATE)/31
from rn_cte rn
join min_cte mc on rn.ID=mc.ID
where rn.rn=1
or datediff(d, mc.INDEX_DATE, rn.INDEX_DATE)/31>0),
min_diff_cte as (
select ID, day_diff, min([ROW]) min_row
from diff_cte
group by ID, day_diff)
select t.*
from min_diff_cte mdc
join #tTEST t on mdc.min_row=t.ROW
order by 1;
Output
ROW ID INDEX_DATE
1 abc 1/1/2019
4 abc 2/2/2019
6 abc 3/6/2019
7 def 1/5/2019
9 def 2/8/2019

compare one row with multiple rows

Ex: I have other main table which is having below data
Create table dbo.Main_Table
(
ID INT,
SDate Date
)
Insert Into dbo.Main_Table Values (1,'01/02/2018')
Insert Into dbo.Main_Table Values (2,'01/30/2018')
Create table dbo.test
(
ID INT,
SDate Date
)
Insert Into dbo.test Values (1,'01/01/2018')
Insert Into dbo.test Values (1,'01/02/2018')
Insert Into dbo.test Values (1,'01/30/2018')
Insert Into dbo.test Values (2,'10/01/2018')
Insert Into dbo.test Values (2,'01/02/2018')
Insert Into dbo.test Values (2,'01/30/2018')
I would like to compare data in main table data with test table. We have to join based on ID and if date match found then "yes" else "No". We have to compare one row with multiple rows.
Please let me know if any questions , thanks for you;re help
Something like this?
SQL> with main_table (id, sdate) as
2 (select 1, date '2018-01-02' from dual union all
3 select 2, date '2018-01-30' from dual union all
4 select 3, date '2018-07-25' from dual
5 ),
6 test_table (id, sdate) as
7 (select 1, date '2018-01-02' from dual union all
8 select 2, date '2018-08-30' from dual
9 )
10 select m.id,
11 m.sdate,
12 case when m.sdate = t.sdate then 'yes' else 'no' end status
13 from main_table m left join test_table t on t.id = m.id
14 order by m.id;
ID SDATE STATUS
---------- -------- ------
1 02.01.18 yes
2 30.01.18 no
3 25.07.18 no
SQL>
[EDIT, after reading the comment - if you find a match, you don't need that ID at all]
Here you are:
SQL> with test (id, sdate) as
2 (select 1, date '2018-01-01' from dual union all
3 select 1, date '2018-01-02' from dual union all
4 select 1, date '2018-01-30' from dual union all
5 --
6 select 2, date '2018-10-01' from dual union all
7 select 2, date '2018-01-02' from dual union all
8 select 2, date '2018-01-30' from dual
9 )
10 select id, sdate
11 from test t
12 where not exists (select null
13 from test t1
14 where t1.id = t.id
15 and t1.sdate = to_date('&par_sdate', 'yyyy-mm-dd'));
Enter value for par_sdate: 2018-01-01
ID SDATE
---------- ----------
2 2018-01-30
2 2018-01-02
2 2018-10-01
SQL> /
Enter value for par_sdate: 2018-01-02
no rows selected
SQL>

Oracle: how to query some columns as rows

I use Oracle 11 XE and have the following table:
CREATE TABLE tst
(val_a NUMBER,
val_b NUMBER,
val_c NUMBER,
val_sum NUMBER,
id NUMBER,
dt DATE)
Some sample data:
INSERT INTO tst
VALUES(12,15,17,44,1,TO_DATE('2018-03-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS'));
INSERT INTO tst
VALUES(14,16,11,41,1,TO_DATE('2018-03-03 00:00:00', 'YYYY-MM-DD HH24:MI:SS'));
INSERT INTO tst
VALUES(6,7,8,21,2,TO_DATE('2018-03-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS'));
I need to specify two dates and get the following result (NEW_VAL are values SUM, A, B and C for ID=1 and DT=2018-03-03, OLD_VAL are values for ID=1 and DT=2018-03-01):
ID X NEW_VAL OLD_VAL
--- --- --------- --------
1 SUM 41 44
A 14 12
B 16 15
C 11 17
Below is the query I've implemented:
select id, x, new_val, old_val from(
select tst_new.id id0, 1, tst_new.id, 'SUM' x, tst_new.val_sum new_val, tst_old.val_sum old_val from tst tst_new,
(select * from tst where dt=to_date('01.03.2018', 'dd.mm.yyyy')) tst_old
where tst_new.dt=to_date('03.03.2018', 'dd.mm.yyyy') and tst_new.id = tst_old.id and tst_new.id = 1
UNION ALL
select tst_new.id, 2, null, 'A', tst_new.val_a, tst_old.val_a from tst tst_new,
(select * from tst where dt=to_date('01.03.2018', 'dd.mm.yyyy')) tst_old
where tst_new.dt=to_date('03.03.2018', 'dd.mm.yyyy') and tst_new.id = tst_old.id and tst_new.id = 1
UNION ALL
select tst_new.id, 3, null, 'B', tst_new.val_b, tst_old.val_b from tst tst_new,
(select * from tst where dt=to_date('01.03.2018', 'dd.mm.yyyy')) tst_old
where tst_new.dt=to_date('03.03.2018', 'dd.mm.yyyy') and tst_new.id = tst_old.id and tst_new.id = 1
UNION ALL
select tst_new.id, 4, null, 'C', tst_new.val_c, tst_old.val_c from tst tst_new,
(select * from tst where dt=to_date('01.03.2018', 'dd.mm.yyyy')) tst_old
where tst_new.dt=to_date('03.03.2018', 'dd.mm.yyyy') and tst_new.id = tst_old.id and tst_new.id = 1
order by 1, 2
)
It does provide required result but looks terrible. Is there any way to get that result easier?
Also, if there is no data for the particular date, result should contain ID, X and empty cells. My query just returns nothing if there is no data for any of two dates. How to make query return empty cells if there are no values for that date?
UPDATE: I've seen examples with pivot, but in my case not only columns as rows is required, but also querying data from the same table for different dates. Also, it's not clear how to get empty cells if there is no date for the particular date.
Inner subquery is result of unpivot, outer - pivoting back:
select *
from (select to_char(dt, 'dd.mm.yyyy') dt, vals, dt_vals from tst
unpivot (dt_vals for vals in (val_a, val_b, val_c, val_sum)))
pivot (sum(dt_vals) for dt in ('01.03.2018', '03.03.2018'))
order by 1
VALS '01.03.2018' '03.03.2018'
------- ------------ ------------
VAL_A 18 14
VAL_B 22 16
VAL_C 25 11
VAL_SUM 65 41
Next, you need to specify the rule how to filter these values:
NEW_VAL are values SUM, A B and C for ID = 1 and DT = 2018-03-03, OLD_VAL are values for ID = 1 and DT = 2018-03-01
I just hardcoded it "as is":
select *
from (select to_char(dt, 'dd.mm.yyyy') dt, vals, dt_vals from tst
unpivot (dt_vals for vals in (val_a, val_b, val_c, val_sum))
where id = 1
)
pivot (sum(dt_vals) for dt in ('01.03.2018', '03.03.2018'))
order by 1
VALS '01.03.2018' '03.03.2018'
------- ------------ ------------
VAL_A 12 14
VAL_B 15 16
VAL_C 17 11
VAL_SUM 44 41