expand oracle table fields linearly - sql

I have the following table in oracle:
ID field_1 field_2
1 1-5 1-5
1 20-30 55-65
2 1-8 10-17
2 66-72 80-86
I need to convert this table to the following format where field_1 and field_2 must be matched linearly:
ID field_1 field_2
1 1 1
1 2 2
1 3 3
1 4 4
1 5 5
1 20 55
1 21 56
1 22 57
1 23 58
1 24 59
1 25 60
1 26 61
1 27 62
1 28 63
1 29 64
1 30 65
2 1 10
2 2 11
2 3 12
2 4 13
2 5 14
2 6 15
2 7 16
2 8 17
2 66 80
2 67 81
2 68 82
2 69 83
2 70 84
2 71 85
2 72 86
what is the easiest and fastest way to accomplish this, knowing that the original table contains thousands of records

The lateral clause, used below, is available since Oracle 12.1. For older versions, a connect by hierarchical query is still probably the fastest, but it will need to be written with a bit more care (and it will be slower than using connect by in a lateral join).
Of course, the big assumption is that the inputs are always in the form number-dash-number, and that the difference between the upper and the lower bound is the same in the two columns, for each row. I am not even attempting to check for that.
select t.id, l.field_1, l.field_2
from mytable t,
lateral (select to_number(substr(field_1, 1, instr(field_1, '-') - 1))
+ level - 1 as field_1,
to_number(substr(field_2, 1, instr(field_2, '-') - 1))
+ level - 1 as field_2
from dual
connect by level <=
to_number(substr(field_1, instr(field_1, '-') + 1))
- to_number(substr(field_1, 1, instr(field_1, '-') - 1)) + 1
) l
;

One option uses a recursive query. Starting 11gR2, Oracle supports standard recursive common table expressions, so you can do:
with cte(id, field_1, field_2, max_field_1, max_field_2) as (
select
id,
to_number(regexp_substr(field_1, '^\d+')),
to_number(regexp_substr(field_2, '^\d+')),
to_number(regexp_substr(field_1, '\d+$')),
to_number(regexp_substr(field_2, '\d+$'))
from mytable
union all
select
id,
field_1 + 1,
field_2 + 1,
max_field_1,
max_field_2
from cte
where field_1 < max_field_1
)
select id, field_1, field_2 from cte order by id, field_1
This assumes that intervals on the same row have always the same length, as showned in your sample data. If that's not the case, you would to explain how you want to handle that.
Demo on DB Fiddle:
ID | FIELD_1 | FIELD_2
-: | ------: | ------:
1 | 1 | 1
1 | 2 | 2
1 | 3 | 3
1 | 4 | 4
1 | 5 | 5
1 | 20 | 55
1 | 21 | 56
1 | 22 | 57
1 | 23 | 58
1 | 24 | 59
1 | 25 | 60
1 | 26 | 61
1 | 27 | 62
1 | 28 | 63
1 | 29 | 64
1 | 30 | 65
2 | 1 | 10
2 | 2 | 11
2 | 3 | 12
2 | 4 | 13
2 | 5 | 14
2 | 6 | 15
2 | 7 | 16
2 | 8 | 17
2 | 66 | 80
2 | 67 | 81
2 | 68 | 82
2 | 69 | 83
2 | 70 | 84
2 | 71 | 85
2 | 72 | 86

You can use the cross join with generated values as follows:
SELECT ID,
to_number(regexp_substr(field_1, '[0-9]+',1,1)) + column_value - 1 AS FIELD_1,
to_number(regexp_substr(field_2, '[0-9]+',1,1)) + column_value - 1 AS FIELD_2
FROM your_table
cross join
table(cast(multiset(select level from dual
connect by level <=
to_number(regexp_substr(field_1, '[0-9]+',1,2))
- to_number(regexp_substr(field_1, '[0-9]+',1,1))
+ 1
) as sys.odcinumberlist))
ORDER BY 1,2

Related

RANK data by value in the column

I'd like to divide the data into separate groups (chunks) based on the value in the column. If the value increase above certain threshold, the value in the "group" should increase by 1.
This would be easy to achieve in MySQL, by doing CASE WHEN #val > 30 THEN #row_no + 1 ELSE #row_no END however I am using Amazon Redshift where this is not allowed.
Sample fiddle: http://sqlfiddle.com/#!15/00b3aa/6
Suggested output:
ID
Value
Group
1
11
1
2
11
1
3
22
1
4
11
1
5
35
2
6
11
2
7
11
2
8
11
2
9
66
3
10
11
3
A cumulative sum should do what you want:
SELECT *, sum((val>=30)::INTEGER) OVER (ORDER BY id BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM mydata ORDER BY id;
id | val | sum
----+-----+-----
1 | 11 | 0
2 | 11 | 0
3 | 22 | 0
4 | 11 | 0
5 | 35 | 1
6 | 11 | 1
7 | 11 | 1
8 | 11 | 1
9 | 66 | 2
10 | 11 | 2

SQL query for output [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have two tables T1 with columns Id. It has 5 values
1
2
3
4
5
Table T2 has 3 columns WeekNo, Id (same as first table) and count and the data is like:
40 1 10
40 2 11
41 1 13
41 2 12
41 3 14
42 1 16
42 2 15
42 3 17
42 4 18
42 5 19
I am trying to write one query which will give output like:
40 1 10
40 2 11
40 3 0
40 4 0
40 5 0
41 1 13
41 2 12
41 3 14
41 4 0
41 5 0
42 1 16
42 2 15..
Use a partitioned outer join:
SELECT t2.weekno,
t1.id,
COALESCE( t2.cnt, 0 ) AS cnt
FROM T1
LEFT OUTER JOIN T2
PARTITION BY ( T2.weekno )
ON (T1.id = T2.id);
outputs:
WEEKNO | ID | CNT
-----: | -: | --:
40 | 1 | 10
40 | 2 | 11
40 | 3 | 0
40 | 4 | 0
40 | 5 | 0
41 | 1 | 13
41 | 2 | 12
41 | 3 | 14
41 | 4 | 0
41 | 5 | 0
42 | 1 | 16
42 | 2 | 15
42 | 3 | 17
42 | 4 | 18
42 | 5 | 19
db<>fiddle here
In standard SQL, you would generate the rows and then left join the values:
select w.weekno, t1.id, coalesce(t2.cnt, 0)
from (select distinct weekno from t2) w cross join
t1 left join
t2
on t2.weekno = w.weekno and t2.id = t.id;
I assume the partition by clause has some superior performance compared to this. It is an Oracle-only feature.

select two values and the two values with which former values have minimum distance with

I have three columns
Key, x1, y1
1 31 34
2 43 40
3 41 44
4 100 40
My expected output is:
Key, x1, y2, closest_x1, closest_y2
1 31 34 43 40
2 43 40 41 44
3 41 44 43 40
4 100 40 41 44
what can be the simplest sql query to have the expected output?
Note that both the values x1,y1 are considered while finding out the closest pair
A possible solution for your use case would to self-join the table using a NOT EXIST clause to ensure that the record being joined is the closest possible record to the current record.
SELECT t1.*, t2.x closest_x, t2.y closest_y
FROM mytable t1
INNER JOIN mytable t2
ON t2.k <> t1.k
AND NOT EXISTS (
SELECT 1
FROM mytable t3
WHERE
t3.k <> t1.k
AND abs(t1.x - t3.x) + abs(t1.y - t3.y) < abs(t1.x - t2.x) + abs(t1.y - t2.y)
)
ORDER BY 1;
Notes :
I renamed the table fields to k(ey), x and y for more readbility.
As suggested by you, this uses Euclidian formula to compute the distance : abs(x2-x1) + abs(y2-y1)
Please note that, if several records are smallest and equidistant, multiple rows will be returned.
View on DB Fiddle :
SELECT t1.*, t2.x closest_x, t2.y closest_y
FROM mytable t1
INNER JOIN mytable t2
ON t2.k <> t1.k
AND NOT EXISTS (
SELECT 1
FROM mytable t3
WHERE
t3.k <> t1.k
AND abs(t1.x - t3.x) + abs(t1.y - t3.y) < abs(t1.x - t2.x) + abs(t1.y - t2.y)
)
ORDER BY 1;
| k | x | y | closest_x | closest_y |
| --- | --- | --- | --------- | --------- |
| 1 | 31 | 34 | 43 | 40 |
| 2 | 43 | 40 | 41 | 44 |
| 3 | 41 | 44 | 43 | 40 |
| 4 | 100 | 40 | 43 | 40 |
I think that in line 4 of your expected result the closest values are wrong.
It should be:
4 100 40 43 40
at least this is the result I get by this:
select t.*, tt.x1 closest_x1, tt.y1 closest_y1
from tablename t
inner join tablename tt
on tt.key = (
select min(key) from tablename where power(x1 - t.x1, 2) + power(y1 - t.y1, 2) = (
select min(power(x1 - t.x1, 2) + power(y1 - t.y1, 2)) from tablename where key <> t.key
)
)
order by t.key
Results:
| key | x1 | y1 | closest_x1 | closest_y1 |
| ---- | --- | --- | ---------- | ---------- |
| 1 | 31 | 34 | 43 | 40 |
| 2 | 43 | 40 | 41 | 44 |
| 3 | 41 | 44 | 43 | 40 |
| 4 | 100 | 40 | 43 | 40 |

SQL/REGEX puzzle/challenge How to convert ASCII art ranges with multiple characters to relational data?

The motivation here was to easily and accurately generate data samples for the nested ranges challenge.
A table contains a single column of text type.
The text contains one or more lines where each lines contains one or more sections created from letters.
The goal is to write a query that returns a tuple for each section with its start point ,end point and value.
Data sample
create table t (txt varchar (1000));
insert into t (txt) values
(
'
AAAAAAAAAAAAAAAAAAAAAAAAAAAA BBBB CCCCCCCCCCCCCCCCCCCCCCCCC
DDDE FFFFFFFF GGGGGGGGG HHHHHHHH IIIIIII
JJ KKKLLL MM NN OOOOO
P QQ
'
)
;
Requested results
* Only the last 3 columns (section start/end/val) are required, the rest are for debugging purposes.
line_ind section_ind section_length section_start section_end section_val
1 1 28 1 28 A
1 2 4 31 34 B
1 3 25 39 63 C
2 1 3 1 3 D
2 2 1 4 4 E
2 3 8 7 14 F
2 4 9 19 27 G
2 5 8 43 50 H
2 6 7 55 61 I
3 1 2 1 2 J
3 2 3 9 11 K
3 3 3 12 14 L
3 4 2 22 23 M
3 5 2 25 26 N
3 6 5 57 61 O
4 1 1 13 13 P
4 2 2 60 61 Q
Teradata
Currently regexp_split_to_table doesn't seem to support zero-length expression (I've created incident RECGZJKZV). In order to overcome this limitation I'm using regexp_replace to push space between adjacent sequences of letters, e.g. KKKLLL
with l
as
(
select line_ind
,line
from table
(
regexp_split_to_table (-1,t.txt,'\r','')
returns (minus_one int,line_ind int,line varchar(1000))
)
as l
)
select l.line_ind
,r.section_ind
,char_length (r.section) as section_length
,regexp_instr (l.line,'(\S)\1*',1,r.section_ind,0) as section_start
,regexp_instr (l.line,'(\S)\1*',1,r.section_ind,1) - 1 as section_end
,substr (r.section,1,1) as section_val
from table
(
regexp_split_to_table (l.line_ind,regexp_replace (l.line,'(?<=(?P<c>.))(?!(?P=c))',' '),'\s+','')
returns (line_ind int,section_ind int,section varchar(1000))
)
as r
,l
where l.line_ind =
r.line_ind
order by l.line_ind
,r.section_ind
;
Oracle
select regexp_instr (txt,'(\S)\1*',1,level,0) - instr (txt,chr(10),regexp_instr (txt,'(\S)\1*',1,level,0) - length (txt) - 1,1) as section_start
,regexp_instr (txt,'(\S)\1*',1,level,1) - 1 - instr (txt,chr(10),regexp_instr (txt,'(\S)\1*',1,level,0) - length (txt) - 1,1) as section_end
,regexp_substr (txt,'(\S)\1*',1,level,'',1) as section_val
from t
connect by level <= regexp_count (txt,'(\S)\1*')
;
Oracle
This will work even if you have multiple input rows:
WITH lines ( txt, id, line, pos, line_no ) AS(
SELECT txt,
id,
REGEXP_SUBSTR( txt, '.*', 1, 1 ),
REGEXP_INSTR( txt, '.*', 1, 1, 1 ),
1
FROM t
UNION ALL
SELECT txt,
id,
REGEXP_SUBSTR( txt, '.*', pos + 1, 1 ),
REGEXP_INSTR( txt, '.*', pos + 1, 1, 1 ),
line_no + 1
FROM lines
WHERE pos > 0
),
words ( id, line, line_no, section_start, section_end, section_value ) AS (
SELECT id,
line,
line_no,
REGEXP_INSTR( line, '(\S)\1*', 1, 1, 0 ),
REGEXP_INSTR( line, '(\S)\1*', 1, 1, 1 ) - 1,
REGEXP_SUBSTR( line, '(\S)\1*', 1, 1, NULL, 1 )
FROM lines
WHERE pos > 0
AND line IS NOT NULL
UNION ALL
SELECT id,
line,
line_no,
REGEXP_INSTR( line, '(\S)\1*', section_end + 1, 1, 0 ),
REGEXP_INSTR( line, '(\S)\1*', section_end + 1, 1, 1 ) - 1,
REGEXP_SUBSTR( line, '(\S)\1*', section_end + 1, 1, NULL, 1 )
FROM words
WHERE section_end > 0
)
SELECT id,
line_no,
section_start,
section_end,
section_value
FROM words
WHERE section_end > 0
ORDER BY id, line_no, section_start
So, for the input data (with an added id column to be able to easily differentiate the pieces of text):
create table t (id NUMBER(5,0), txt varchar (1000));
insert into t (id, txt) values
(
1,
'
AAAAAAAAAAAAAAAAAAAAAAAAAAAA BBBB CCCCCCCCCCCCCCCCCCCCCCCCC
DDDE FFFFFFFF GGGGGGGGG HHHHHHHH IIIIIII
JJ KKKLLL MM NN OOOOO
P QQ
'
);
insert into t (id, txt) values ( 2, 'RRRSTT UUU V WXYZ' );
This outputs:
ID | LINE_NO | SECTION_START | SECTION_END | SECTION_VALUE
-: | ------: | ------------: | ----------: | :------------
1 | 2 | 1 | 28 | A
1 | 2 | 31 | 34 | B
1 | 2 | 39 | 63 | C
1 | 3 | 1 | 3 | D
1 | 3 | 4 | 4 | E
1 | 3 | 7 | 14 | F
1 | 3 | 19 | 27 | G
1 | 3 | 43 | 50 | H
1 | 3 | 55 | 61 | I
1 | 4 | 1 | 2 | J
1 | 4 | 9 | 11 | K
1 | 4 | 12 | 14 | L
1 | 4 | 22 | 23 | M
1 | 4 | 25 | 26 | N
1 | 4 | 57 | 61 | O
1 | 5 | 13 | 13 | P
1 | 5 | 60 | 61 | Q
2 | 1 | 1 | 3 | R
2 | 1 | 4 | 4 | S
2 | 1 | 5 | 6 | T
2 | 1 | 8 | 10 | U
2 | 1 | 15 | 15 | V
2 | 1 | 17 | 17 | W
2 | 1 | 18 | 18 | X
2 | 1 | 19 | 19 | Y
2 | 1 | 20 | 20 | Z
db<>fiddle here

PostgreSQL - finding and updating multiple records

I have a table:
ID | rows | dimensions
---+------+-----------
1 | 1 | 15 x 20
2 | 3 | 2 x 10
3 | 5 | 23 x 33
3 | 7 | 15 x 23
4 | 2 | 12 x 32
And I want to have something like that:
ID | rows | dimensions
---+------+-----------
1 | 1 | 15 x 20
2 | 3 | 2 x 10
3a | 5 | 23 x 33
3b | 7 | 15 x 23
4 | 2 | 12 x 32
How can I find the multiple ID value to make it unique?
How can I update the parent table after?
Thanks for your help!
with stats as (
SELECT "ID",
"rows",
row_number() over (partition by "ID" order by rows) as rn,
count(*) over (partition by "ID") as cnt
FROM Table1
)
UPDATE Table1
SET "ID" = CASE WHEN s.cnt > 1 THEN s."ID" || '-' || s.rn
ELSE s."ID"
END
FROM stats s
WHERE S."ID" = Table1."ID"
AND S."rows" = Table1."rows"
I'm assuming you cant have two rows with same ID and same rows other wise you need to include "dimensions" on the WHERE too.
In this case the output is