Convert 1 column to 2 columns in Oracle SQL - sql

We get the data in the following format which I am able to convert using Regular expression query. The data is start and end data of tasks concatenated with pipe.
Data:
|2020/04/26|2020/05/02|2020/05/03|2020/05/10
Query:
select REGEXP_SUBSTR (:p, '[^|]+', 1, level) as periods from dual
connect by level <= length (regexp_replace(:p, '[^|]+'))
Result:
2020/04/26
2020/05/02
2020/05/03
2020/05/10
We need to separate it with start date and end date. The number of start date and end date combination would is dynamic. But there will be end date for start date and we won't get null.
Expected Result
START DATE END DATE
2020/04/26 2020/05/02
2020/05/03 2020/05/10
Thanks in advance.

You could do arithmetics and conditional aggregation:
select
max(case when mod(lvl, 2) = 0 then periods end) start_date,
max(case when mod(lvl, 2) = 1 then periods end) end_date
from (
select
regexp_substr (:p, '[^|]+', 1, level) as periods,
level - 1 as lvl
from dual
connect by level <= length (regexp_replace(:p, '[^|]+'))
) t
group by trunc(lvl / 2)
Demo on DB Fiddle:
START_DATE | END_DATE
:--------- | :---------
2020/04/26 | 2020/05/02
2020/05/03 | 2020/05/10

A solution that will work if you have one or more input rows (whereas your hierarchical query will generate exponentially increasing numbers of duplicate rows if you input more than one row of data to it).
Convert pairs of dates to XML and then use XMLTABLE to convert:
SELECT id,
x.*
FROM test_data t
CROSS JOIN
XMLTABLE(
( LTRIM(
REGEXP_REPLACE(
t.value,
'\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})',
',<row><start>\1</start><end>\2</end></row>'
),
','
)
)
COLUMNS
start_date DATE PATH '/row/start',
end_date DATE PATH '/row/end'
) x
So, for your test data:
CREATE TABLE test_data ( id, value ) AS
SELECT 1, '|2020/04/26|2020/05/02|2020/05/03|2020/05/10' FROM DUAL UNION ALL
SELECT 2, '|2020/06/01|2020/06/02' FROM DUAL
This outputs:
ID | START_DATE | END_DATE
-: | :--------- | :--------
1 | 26-APR-20 | 02-MAY-20
1 | 03-MAY-20 | 10-MAY-20
2 | 01-JUN-20 | 02-JUN-20
db<>fiddle here
Or, if you only have a single input, you can split your data on pairs:
SELECT REGEXP_SUBSTR ( :p, '\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})', 1, level, NULL, 1 ) as start_date,
REGEXP_SUBSTR ( :p, '\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})', 1, level, NULL, 2 ) as end_date
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( :p, '\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})' )
Which outputs:
START_DATE | END_DATE
:--------- | :---------
2020/04/26 | 2020/05/02
2020/05/03 | 2020/05/10
db<>fiddle here
Or use:
SELECT *
FROM XMLTABLE(
( LTRIM(
REGEXP_REPLACE(
:p,
'\|(\d{4}/\d{2}/\d{2})\|(\d{4}/\d{2}/\d{2})',
',<row><start>\1</start><end>\2</end></row>'
),
','
)
)
COLUMNS
start_date DATE PATH '/row/start',
end_date DATE PATH '/row/end'
)
db<>fiddle here

Related

Oracle SQL count dates that do not exist between range of dates

I have range of dates:
| date |
| -------- |
| 1/1/2022 |
| 2/1/2022 |
| 3/1/2022 |
| 5/1/2022 |
| 6/1/2022 |
| 7/1/2022 |
| 8/1/2022 |
| 10/1/2022 |
I want to get the dates that are not included between these dates, in this case 4/1 and 9/1, I want the count of these dates, in this case 2, so I want the count of dates that do not exist between a specific range of dates, how can I achieve that?
select (max(date) - min(date) + 1) - count(distinct date)
from table_name
https://dbfiddle.uk/cSKZloYA
(max(date) - min(date) + 1) will give the total number of days in the range.
count(distinct date) will be the number of existing (different) days in the table.
The difference between these is the number of non-existing days.
Note: date is a reserved word, so if it's the actual column name, it has to be delimited as "date". (https://en.wikipedia.org/wiki/List_of_SQL_reserved_words)
You can use the LAG analytic function to find the previous date and then work out the number of days difference and if it is more than 1 then you have that many missing days:
SELECT SUM(missing_dates) AS num_missing
FROM (
SELECT GREATEST("DATE" - LAG("DATE") OVER (ORDER BY "DATE") - 1, 0)
AS missing_dates
FROM table_name
);
Which, for the sample data:
CREATE TABLE table_name ("DATE") AS
SELECT DATE '2020-01-01' FROM DUAL UNION ALL
SELECT DATE '2020-01-02' FROM DUAL UNION ALL
SELECT DATE '2020-01-03' FROM DUAL UNION ALL
SELECT DATE '2020-01-05' FROM DUAL UNION ALL
SELECT DATE '2020-01-06' FROM DUAL UNION ALL
SELECT DATE '2020-01-07' FROM DUAL UNION ALL
SELECT DATE '2020-01-08' FROM DUAL UNION ALL
SELECT DATE '2020-01-10' FROM DUAL;
Outputs:
NUM_MISSING
2
fiddle

Stack several rows into one with date condition

I've got raw data from table with information about clients. Information comes from different sources, so it causes duplicates but with different dates:
id pp type start_dt end_dt
100| 1 | Y | 01.05.19 | 01.10.20
100| 1 | Y | 10.08.20 | 01.10.20
100| 1 | N | 01.10.20 | 02.12.21
100| 1 | N | 13.12.20 | 02.12.21
100| 1 | Y | 02.12.21 | 02.12.26
100| 1 | Y | 20.12.21 | 20.12.26
For example, in this table row 2, 4 and 6 have start date within "start_dt" and "end_dt" of previous row. It's a duplicate, but I need to combine min start date and max end date from both rows for type.
FYI. First two rows and last two rows have same id, pp and type, but I need to stack them separately because of the timeline.
What I want to get (continuous timeline for a client is a key):
id pp type start_dt end_dt | cnt
100| 1 | Y | 01.05.19 | 01.10.20 | 2
100| 1 | N | 01.10.20 | 02.12.21 | 2
100| 1 | Y | 02.12.21 | 20.12.26 | 2
I'm using PL/SQL. I think it could be solved by window functions, but I can't figure out which functions to use.
Tried to solve it by group by while having > 1, but in this case it stacks four rows with same type (rows 1,2 and 5,6) into one. I need two separate rows for each type while saving continuous timeline of dates for one client.
From Oracle 12, you can use MATCH_RECOGNIZE for row-by-row pattern matching:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id, pp
ORDER BY start_dt
MEASURES
FIRST(type) AS type,
FIRST(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
PATTERN (overlapping* last_row)
DEFINE
overlapping AS type = NEXT(type)
AND MAX(end_dt) >= NEXT(start_dt)
)
Which, for the sample data:
CREATE TABLE table_name (id, pp, type, start_dt, end_dt) AS
SELECT 100, 1, 'Y', DATE '2019-05-01', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2020-08-10', DATE '2020-10-01' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-10-01', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'N', DATE '2020-12-13', DATE '2021-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-02', DATE '2026-12-02' FROM DUAL UNION ALL
SELECT 100, 1, 'Y', DATE '2021-12-20', DATE '2026-12-20' FROM DUAL;
Outputs:
ID
PP
TYPE
START_DT
END_DT
CNT
100
1
Y
2019-05-01 00:00:00
2020-10-01 00:00:00
2
100
1
N
2020-10-01 00:00:00
2021-12-02 00:00:00
2
100
1
Y
2021-12-02 00:00:00
2026-12-20 00:00:00
2
fiddle
If you want to use analytic and aggregation functions then it is a bit more complicated:
SELECT id, pp, type,
MIN(start_dt) AS start_dt,
MAX(end_dt) AS end_dt,
COUNT(*) AS cnt
FROM (
SELECT id, pp, type, start_dt, end_dt,
SUM(grp_change) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
) AS grp
FROM (
SELECT t.*,
CASE
WHEN start_dt <= MAX(end_dt) OVER (
PARTITION BY id, pp, type
ORDER BY start_dt
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
THEN 0
ELSE 1
END AS grp_change
FROM table_name t
)
)
GROUP BY id, pp, type, grp
ORDER BY id, pp, start_dt
fiddle
I prefer this version because comparing "type = next(type)" without "type" being in the "order by" may lead to errors.
match_recognize(
partition by id, pp, type
order by start_dt,end_dt
measures first(start_dt) as start_dt, max(end_dt) as end_dt, count(*) as n
pattern (merged* strt)
define
merged as max(end_dt) >= next(start_dt)
)

Selecting distinct timestamps from an ORACLE database

I am trying to limit the amount of data I pull in prior to processing/analysing it in python.
Mainly due to memory constraints.
Each transaction results in ~3-4 different transaction_events.
-----------
trx_id timestamp
trx_1 | 2021.01.01 15:45:40
trx_1_2 | 2021.01.01 15:45:40
trx_1_3 | 2021.01.01 15:45:40
trx_2 | 2021.02.01 14:15:40
trx_2_2 | 2021.02.01 14:15:40
trx_2_3 | 2021.02.01 14:15:40
All I need is 1 record per timestamp.
-----------
trx_id timestamp
trx_1 | 2021.01.01 15:45:40
trx_2 | 2021.02.01 14:15:40
I've already tried the following suggestions:
On the ORACLE community forum
and
select distinct(date) return the same date several time
I've tried various variations too
SELECT DISTINCT TRUNC(timestamp, 'DD')
SELECT DISTINCT TRUNC(timestamp)
SELECT DISTINCT to_char(timestamp, 'yyyy-mm-dd')
However with no results.
You can use the ROW_NUMBER analytic function and partition by the first 5 characters of the TRX_ID and the timestamp:
SELECT trx_id, ts
FROM (
SELECT t.*,
ROW_NUMBER() OVER (
PARTITION BY SUBSTR( trx_id, 1, 5 ), ts ORDER BY trx_id
) AS rn
FROM table_name t
)
WHERE rn = 1;
Which, for your sample data:
CREATE TABLE table_name ( trx_id, ts ) AS
SELECT 'trx_1', TIMESTAMP '2021-01-01 15:45:40' FROM DUAL UNION ALL
SELECT 'trx_1_2', TIMESTAMP '2021-01-01 15:45:40' FROM DUAL UNION ALL
SELECT 'trx_1_3', TIMESTAMP '2021-01-01 15:45:40' FROM DUAL UNION ALL
SELECT 'trx_2', TIMESTAMP '2021-02-01 14:15:40' FROM DUAL UNION ALL
SELECT 'trx_2_2', TIMESTAMP '2021-02-01 14:15:40' FROM DUAL UNION ALL
SELECT 'trx_2_3', TIMESTAMP '2021-02-01 14:15:40' FROM DUAL;
Outputs:
TRX_ID
TS
trx_1
2021-01-01 15:45:40.000000000
trx_2
2021-02-01 14:15:40.000000000
If you can have other TRX_ID with different length patterns then you can look for the second underscore character and get the substring before that:
SELECT trx_id, ts
FROM (
SELECT t.*,
ROW_NUMBER() OVER (
PARTITION BY CASE INSTR( trx_id, '_', 1, 2 )
WHEN 0
THEN trx_id
ELSE SUBSTR( trx_id, 1, INSTR( trx_id, '_', 1, 2 ) - 1 )
END,
ts
ORDER BY trx_id
) AS rn
FROM table_name t
)
WHERE rn = 1;
db<>fiddle here
You can use aggregation:
select min(trx_id), timestamp
from t
group by timestamp;
Here is a db<>fiddle.

Converting data having unix timestamps randomly in data of one column of Oracle

Eg: One row in one column contains below data.I need to convert the unix timestamps to actual datetime in oracle.Please provide sql query to convert.
NOTE: {EOT} is special character End of Transmission.
{ETX} is special character End of Text.
1550226213{EOT}Bharath
testtest{ETX}1550226559{EOT}LakshmanUpdate to Current Summary: {EOT}Under Investigation{EOT}
suresh{ETX}1550227918S{EOT}itaUpdate to Current Summary: {EOT}Outage restored- Under Observation
{ETX}1550301176{EOT}Rama
Assuming that the times are all at the start of each line within the string then you can use a recursive subquery-factoring clause to iterate over each sub-string line within the string and regular expressions to find each time at the start of that line and then all you need to do is add the offset interval to the epoch the time is being measured from (i.e. 1970-01-01):
Oracle Setup:
CREATE TABLE test_data ( id, value ) AS
SELECT 1, '1550226213Bharath
1550226559LakshmanUpdate to Current Summary: Under Investigation
1550227918SitaUpdate to Current Summary: Outage restored- Under Observation
1550301176Rama' FROM DUAL UNION ALL
SELECT 2, '0ABC' FROM DUAL UNION ALL
SELECT 3, NULL FROM DUAL UNION ALL
SELECT 4, '1234567890A
1234567891B
1234567892C' FROM DUAL;
Query:
WITH lines ( id, value, unix_time, description, line_no, total_lines ) AS (
SELECT id,
value,
TO_NUMBER( REGEXP_SUBSTR( value, '^\d+', 1, 1, 'm' ) ),
REGEXP_SUBSTR( value, '^\d+(.*)$', 1, 1, 'm', 1 ),
1,
COALESCE( REGEXP_COUNT( value, '^\d+', 1, 'm' ), 0 )
FROM test_data
UNION ALL
SELECT id,
value,
TO_NUMBER( REGEXP_SUBSTR( value, '^\d+', 1, line_no + 1, 'm' ) ),
REGEXP_SUBSTR( value, '^\d+(.*)$', 1, line_no + 1, 'm', 1 ),
line_no + 1,
total_lines
FROM lines
WHERE line_no < total_lines
)
SELECT id,
DATE '1970-01-01' + unix_time * INTERVAL '1' SECOND AS time,
description
FROM lines
ORDER BY id, line_no;
Output:
ID | TIME | DESCRIPTION
-: | :------------------ | :----------------------------------------------------------------
1 | 2019-02-15 10:23:33 | Bharath
1 | 2019-02-15 10:29:19 | LakshmanUpdate to Current Summary: Under Investigation
1 | 2019-02-15 10:51:58 | SitaUpdate to Current Summary: Outage restored- Under Observation
1 | 2019-02-16 07:12:56 | Rama
2 | 1970-01-01 00:00:00 | ABC
3 | null | null
4 | 2009-02-13 23:31:30 | A
4 | 2009-02-13 23:31:31 | B
4 | 2009-02-13 23:31:32 | C
db<>fiddle here
Update:
Oracle Setup:
CREATE TABLE test_data ( id, value ) AS
SELECT 1, '1550226213{EOT}Bharath
testtest{ETX}1550226559{EOT}LakshmanUpdate to Current Summary: {EOT}Under Investigation{EOT}
suresh{ETX}1550227918{EOT}itaUpdate to Current Summary: {EOT}Outage restored- Under Observation
{ETX}1550301176{EOT}Rama' FROM DUAL UNION ALL
SELECT 2, '0{EOT}ABC' FROM DUAL UNION ALL
SELECT 3, NULL FROM DUAL UNION ALL
SELECT 4, '1234567890{EOT}A{ETX}1234567891{EOT}B{ETX}1234567892{EOT}C' FROM DUAL;
Query:
WITH lines ( id, value, line, line_no, total_lines ) AS (
SELECT id,
value,
REGEXP_SUBSTR( value, '(.+?)(\{ETX\}|$)', 1, 1, 'n', 1 ),
1,
COALESCE( REGEXP_COUNT( value, '(.+?)(\{ETX\}|$)', 1, 'n' ), 0 )
FROM test_data
UNION ALL
SELECT id,
value,
REGEXP_SUBSTR( value, '(.+?)(\{ETX\}|$)', 1, line_no + 1, 'n', 1 ),
line_no + 1,
total_lines
FROM lines
WHERE line_no < total_lines
)
SELECT id,
DATE '1970-01-01' + TO_NUMBER( REGEXP_SUBSTR( line, '^(\d+)(\{EOT\}|$)', 1, 1, 'n', 1 ) ) * INTERVAL '1' SECOND AS time,
REGEXP_SUBSTR( line, '^(\d+)\{EOT\}(.*)$', 1, 1, 'n', 2 ) AS description
FROM lines
ORDER BY id, line_no;
Output:
ID | TIME | DESCRIPTION
-: | :------------------ | :-------------------------------------------------------------------------
1 | 2019-02-15 10:23:33 | Bharath<br>testtest
1 | 2019-02-15 10:29:19 | LakshmanUpdate to Current Summary: {EOT}Under Investigation{EOT}<br>suresh
1 | 2019-02-15 10:51:58 | itaUpdate to Current Summary: {EOT}Outage restored- Under Observation<br>
1 | 2019-02-16 07:12:56 | Rama
2 | 1970-01-01 00:00:00 | ABC
3 | null | null
4 | 2009-02-13 23:31:30 | A
4 | 2009-02-13 23:31:31 | B
4 | 2009-02-13 23:31:32 | C
db<>fiddle here

select periods from date

I have a problem with choosing from the list of absences, those that follow one another and grouping them into periods.
date_from (data_od) date_to(data_do)
--------------------------
18/08/01 - 18/08/15
18/08/16 - 18/08/20
18/08/21 - 18/08/31
18/09/01 - 18/09/08
18/05/01 - 18/05/31
18/06/01 - 18/06/30
18/03/01 - 18/03/18
18/02/14 - 18/02/28
above is a list of absences, and the result of which should be a table:
date_from (data_od) date_to(data_do)
--------------------------
18/08/01 18/09/08
18/05/01 18/06/30
18/02/14 18/03/18
For now, I did something like this, but I only research in twos :(
SELECT u1.data_od,u2.data_do
FROM l_absencje u1 CROSS APPLY
(SELECT * FROM l_absencje labs
WHERE labs.prac_id=u1.prac_id AND
TRUNC(labs.data_od) = TRUNC(u1.data_do)+1
ORDER BY id DESC FETCH FIRST 1 ROWS ONLY
) u2 where u1.prac_id=1067 ;
And give me that:
18/08/01 18/08/20 bad
18/08/16 18/08/31 bad
18/08/21 18/09/08 bad
18/05/01 18/06/30 good
18/02/14 18/03/18 good
You can use a combination of the LAG(), LEAD() and LAST_VALUE() analytic functions:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE absences ( date_from, date_to ) AS
SELECT DATE '2018-08-01', DATE '2018-08-15' FROM DUAL UNION ALL
SELECT DATE '2018-08-16', DATE '2018-08-20' FROM DUAL UNION ALL
SELECT DATE '2018-08-21', DATE '2018-08-31' FROM DUAL UNION ALL
SELECT DATE '2018-09-01', DATE '2018-09-08' FROM DUAL UNION ALL
SELECT DATE '2018-05-01', DATE '2018-05-31' FROM DUAL UNION ALL
SELECT DATE '2018-06-01', DATE '2018-06-30' FROM DUAL UNION ALL
SELECT DATE '2018-03-01', DATE '2018-03-18' FROM DUAL UNION ALL
SELECT DATE '2018-02-14', DATE '2018-02-28' FROM DUAL;
Query 1:
SELECT *
FROM (
SELECT CASE
WHEN date_to IS NOT NULL
THEN LAST_VALUE( date_from ) IGNORE NULLS
OVER( ORDER BY ROWNUM )
END AS date_from,
date_to
FROM (
SELECT CASE date_from
WHEN LAG( date_to ) OVER ( ORDER BY date_to )
+ INTERVAL '1' DAY
THEN NULL
ELSE date_from
END AS date_from,
CASE date_to
WHEN LEAD( date_from ) OVER ( ORDER BY date_from )
- INTERVAL '1' DAY
THEN NULL
ELSE date_to
END AS date_to
FROM absences
)
)
WHERE date_from IS NOT NULL
AND date_to IS NOT NULL
Results:
| DATE_FROM | DATE_TO |
|----------------------|----------------------|
| 2018-02-14T00:00:00Z | 2018-03-18T00:00:00Z |
| 2018-05-01T00:00:00Z | 2018-06-30T00:00:00Z |
| 2018-08-01T00:00:00Z | 2018-09-08T00:00:00Z |