PL/SQL - SQL dynamic row and column parsing - sql

I took a look into the forums and couldn't really find something that I needed.
What I have is two tables one table with (Parse_Table)
File_ID|Start_Pos|Length|Description
------------------------------------
1 | 1 | 9 | Pos1
1 | 10 | 1 | Pos2
1 | 11 | 1 | Pos3
2 | 1 | 4 | Pos1
2 | 5 | 7 | Pos2
and another table that needs to be parsed like (Input_file)
String
ABCDEFGHI12
ASRQWERTQ45
123456789AB
321654852PO
and I want to have the result where If I put it will use this specific parsing spec
select DESCRIPTION, Start_pos,Length from Parse_table where File_ID=1
and be able to parse input file
String | Pos1 |Pos2|Pos3
---------------------------------
ABCDEFGHI12 |ABCDEFGHI | 1 | 2
ASRQWERTQ45 |ASRQWERTQ | 4 | 5
123456789AB |123456789 | A | B
321654852PO |321654852 | P | O
and alternatively if I put file_id=2 it would parse the values differently.
I looked at using the Pivot function, but it looks like number of columns are static, at least to my knowledge.
thanks in advance for your support please let me know what I can do in SQL.

You can get "close-ish" with the standard decode tricks to pivot the table assuming a ceiling on the maximum number of fields expected.
SQL> create table t ( fid int, st int, len int, pos varchar2(10));
Table created.
SQL>
SQL> insert into t values ( 1 , 1 , 9 , 'Pos1');
1 row created.
SQL> insert into t values ( 1 , 10 , 1 , 'Pos2');
1 row created.
SQL> insert into t values ( 1 , 11 , 1 , 'Pos3');
1 row created.
SQL> insert into t values ( 2 , 1 , 4 , 'Pos1');
1 row created.
SQL> insert into t values ( 2 , 5 , 7 , 'Pos2');
1 row created.
SQL>
SQL> create table t1 ( s varchar2(20));
Table created.
SQL>
SQL> insert into t1 values ('ABCDEFGHI12');
1 row created.
SQL> insert into t1 values ('ASRQWERTQ45');
1 row created.
SQL> insert into t1 values ('123456789AB');
1 row created.
SQL> insert into t1 values ('321654852PO');
1 row created.
SQL>
SQL>
SQL> select
2 t1.s,
3 max(decode(t.seq,1,substr(t1.s,t.st,t.len))) c1,
4 max(decode(t.seq,2,substr(t1.s,t.st,t.len))) c2,
5 max(decode(t.seq,3,substr(t1.s,t.st,t.len))) c3,
6 max(decode(t.seq,4,substr(t1.s,t.st,t.len))) c4,
7 max(decode(t.seq,5,substr(t1.s,t.st,t.len))) c5,
8 max(decode(t.seq,6,substr(t1.s,t.st,t.len))) c6
9 from t1,
10 ( select t.*, row_number() over ( partition by fid order by st ) as seq
11 from t
12 where fid = 1
13 ) t
14 group by t1.s
15 order by 1;
S C1 C2 C3 C4 C5 C6
-------------------- ------------- ------------- ------------- ------------- ------------- -------------
123456789AB 123456789 A B
321654852PO 321654852 P O
ABCDEFGHI12 ABCDEFGHI 1 2
ASRQWERTQ45 ASRQWERTQ 4 5
4 rows selected.
SQL>
SQL> select
2 t1.s,
3 max(decode(t.seq,1,substr(t1.s,t.st,t.len))) c1,
4 max(decode(t.seq,2,substr(t1.s,t.st,t.len))) c2,
5 max(decode(t.seq,3,substr(t1.s,t.st,t.len))) c3,
6 max(decode(t.seq,4,substr(t1.s,t.st,t.len))) c4,
7 max(decode(t.seq,5,substr(t1.s,t.st,t.len))) c5,
8 max(decode(t.seq,6,substr(t1.s,t.st,t.len))) c6
9 from t1,
10 ( select t.*, row_number() over ( partition by fid order by st ) as seq
11 from t
12 where fid = 2
13 ) t
14 group by t1.s
15 order by 1;
S C1 C2 C3 C4 C5 C6
-------------------- ------------- ------------- ------------- ------------- ------------- -------------
123456789AB 1234 56789AB
321654852PO 3216 54852PO
ABCDEFGHI12 ABCD EFGHI12
ASRQWERTQ45 ASRQ WERTQ45
4 rows selected.
If you really wanted that result to then come back with only the desired column count and custom column names, then you're into dynamic SQL territory. How you'd tackle that depends on the tool you are providing the data to. If it can consume a REF CURSOR, then a little PL/SQL would do the trick.

An unknown number of columns can be returned from a SQL statement, but it requires code built with PL/SQL, ANY types, and Oracle Data Cartridge.
That code is tricky to write but you can start with my open source project Method4. Download, unzip, #install, and then
write a SQL statement to generate a SQL statement.
Query
select * from table(method4.dynamic_query(
q'[
--Create a SQL statement to query PARSE_FILE.
select
'select '||
listagg(column_expression, ',') within group (order by start_pos) ||
' from parse_file'
column_expressions
from
(
--Create individual SUBSTR column expressions.
select
parse_table.*,
'substr(string, '||start_pos||', '||length||') '||description column_expression
from parse_table
--CHANGE BELOW LINE TO USE A DIFFERENT FILE:
where file_id = 2
order by start_pos
)
]'
));
Sample Schema
create table parse_table as
select 1 file_id, 1 start_pos, 9 length, 'Pos1' description from dual union all
select 1 file_id, 10 start_pos, 1 length, 'Pos2' description from dual union all
select 1 file_id, 11 start_pos, 1 length, 'Pos3' description from dual union all
select 2 file_id, 1 start_pos, 4 length, 'Pos1' description from dual union all
select 2 file_id, 5 start_pos, 7 length, 'Pos2' description from dual;
create table parse_file as
select 'ABCDEFGHI12' string from dual union all
select 'ASRQWERTQ45' string from dual union all
select '123456789AB' string from dual union all
select '321654852PO' string from dual;
Results
When FILE_ID = 1:
POS1 POS2 POS3
---- ---- ----
ABCDEFGHI 1 2
ASRQWERTQ 4 5
123456789 A B
321654852 P O
When FILE_ID = 2:
POS1 POS2
---- ----
ABCD EFGHI12
ASRQ WERTQ45
1234 56789AB
3216 54852PO

Related

ORACLE SQL : IF EXISTS UPDATE ELSE INSERT

Lets say :
i have data on OracleDb like what i mentioned above.
TRANSFERNUMBER | VALUE1 | VALUE2
2250 | 1000 | 2000
2251 | 1000 | 3000
My main purpose is when add some data on table if data exists it should update the data . if data not exists on the table it should insert new row on table . That is why i want to use if exists on my query .
However i can't handle the query . Also i can't write procedure because of some reasons on the table . Is anyone help me for writing this by using query on Oracle ?
MERGE is what we usually do. Here's an example:
Test table and sample data:
SQL> create table test (tn number, val1 number, val2 number);
Table created.
SQL> insert into test
2 select 2250, 1000, 2000 from dual union all
3 select 2251, 1000, 3000 from dual;
2 rows created.
SQL> select * From test order by tn;
TN VAL1 VAL2
---------- ---------- ----------
2250 1000 2000
2251 1000 3000
How to do it? using represents data you're going to insert or update:
SQL> merge into test t
2 using (select 2250 tn, 1 val1, 2 val2 from dual union all --> for update
3 select 3000 , 8 , 9 from dual --> for insert
4 ) x
5 on (t.tn = x.tn)
6 when matched then update set t.val1 = x.val1,
7 t.val2 = x.val2
8 when not matched then insert values (x.tn, x.val1, x.val2);
2 rows merged.
Result:
SQL> select * From test order by tn;
TN VAL1 VAL2
---------- ---------- ----------
2250 1 2 --> updated
2251 1000 3000
3000 8 9 --> inserted
SQL>

How to generate a dynamic sequence in Oracle

I have a table A which represents a valid sequence of numbers, which looks something like this:
| id | start | end | step |
|----|-------|-------|------|
| 1 | 4000 | 4999 | 4 |
| 2 | 3 | 20000 | 1 |
A[1] thus represents the sequence [4000, 4004, 4008, ...4996]
and another B of "occupied" numbers that looks like this:
| id | number | ... |
|-----|--------|-----|
| 1 | 4000 | ... |
| 2 | 4003 | ... |
| ... | ... | ... |
I want to construct a query which using A and B, finds the first unoccupied number for a particular sequence.
I have been trying – and failing – to do, is to generate a list of valid numbers from a row in A and then left outer join table B on B.number = valid_number where B.id is null from which result I could then select min(...).
How about this?
I simplified your test case (END value isn't that high) in order to save space (otherwise, I'd have to use smaller font :)).
What does it do?
CTEs A and B are your sample data
FULL_ASEQ creates a sequence of numbers from table A
if you want what it returns, remove everything from line #17 and - instead of it - run select * from full_aseq
the final query returns the first available sequence number, i.e. the one that hasn't been used yet (lines #19 - 23).
Here you go:
SQL> with
2 a (id, cstart, cend, step) as
3 (select 1, 4000, 4032, 4 from dual union all
4 select 2, 3, 20, 1 from dual
5 ),
6 b (id, cnumber) as
7 (select 1, 4000 from dual union all
8 select 1, 4004 from dual union all
9 select 2, 4003 from dual
10 ),
11 full_aseq as
12 (select a.id, a.cstart + column_value * a.step seq_val
13 from a cross join table(cast(multiset(select level from dual
14 connect by level <= (a.cend - a.cstart) / a.step
15 ) as sys.odcinumberlist))
16 )
17 select f.id, min(f.seq_val) min_seq_val
18 from full_aseq f
19 where not exists (select null
20 from b
21 where b.id = f.id
22 and b.cnumber = f.seq_val
23 )
24 group by f.id;
ID MIN_SEQ_VAL
---------- -----------
1 4008
2 4
SQL>
You can use LEAD to compute the difference between ordered rows in table B. Any row having a difference (to the next row) that exceeds the step value for that sequence is a gap.
Here's that concept, implemented (below). I threw in a sequence ID "3" that has no values in table B, to illustrate that it generates the proper first value.
with
a (id, cstart, cend, step) as
(select 1, 4000, 4032, 4 from dual union all
select 2, 3, 20000, 1 from dual union all
select 3, 100, 200, 3 from dual
),
b (id, cnumber) as
(select 1, 4000 from dual union all
select 1, 4004 from dual union all
select 1, 4012 from dual union all
select 2, 4003 from dual
),
work1 as (
select a.id,
b.cnumber cnumber,
lead(b.cnumber,1) over ( partition by b.id order by b.cnumber ) - b.cnumber diff,
a.step,
a.cstart,
a.cend
from a left join b on b.id = a.id )
select w1.id,
CASE WHEN min(w1.cnumber) is null THEN w1.cstart
WHEN min(w1.cnumber)+w1.step < w1.cend THEN min(w1.cnumber)+w1.step
ELSE null END next_cnumber
from work1 w1
where ( diff is null or diff > w1.step )
group by w1.id, w1.step, w1.cstart, w1.cend
order by w1.id
+----+--------------+
| ID | NEXT_CNUMBER |
+----+--------------+
| 1 | 4008 |
| 2 | 4004 |
| 3 | 100 |
+----+--------------+
You can further improve the results by excluding rows in table B that are impossible for the sequence. E.g., exclude a row for ID #1 having a value of, say, 4007.
I'll ask the obvious and suggest why not use an actual sequence?
SQL> set timing on
SQL> CREATE SEQUENCE SEQ_TEST_A
START WITH 4000
INCREMENT BY 4
MINVALUE 4000
MAXVALUE 4999
NOCACHE
NOCYCLE
ORDER
Sequence created.
Elapsed: 00:00:01.09
SQL> CREATE SEQUENCE SEQ_TEST_B
START WITH 3
INCREMENT BY 1
MINVALUE 3
MAXVALUE 20000
NOCACHE
NOCYCLE
ORDER
Sequence created.
Elapsed: 00:00:00.07
SQL> -- get nexvals from A
SQL> select seq_test_a.nextval from dual
NEXTVAL
----------
4000
1 row selected.
Elapsed: 00:00:00.09
SQL> select seq_test_a.nextval from dual
NEXTVAL
----------
4004
1 row selected.
Elapsed: 00:00:00.08
SQL> select seq_test_a.nextval from dual
NEXTVAL
----------
4008
1 row selected.
Elapsed: 00:00:00.08
SQL> -- get nextvals from B
SQL> select seq_test_b.nextval from dual
NEXTVAL
----------
3
1 row selected.
Elapsed: 00:00:00.08
SQL> select seq_test_b.nextval from dual
NEXTVAL
----------
4
1 row selected.
Elapsed: 00:00:00.08
SQL> select seq_test_b.nextval from dual
NEXTVAL
----------
5
1 row selected.
Elapsed: 00:00:00.08

SQL to distinct part of the string - Oracle SQL

I have a table table1 with column line which is of type CLOB
Here are the values:
seq line
------------------------------
1 ISA*00*TEST
ISA*00*TEST1
GS*123GG*TEST*456:EHE
ST*ERT*RFR*EDRR*EER
GS*123GG*TEST*456:EHE
-------------------------------
2 ISA*01*TEST
GS*124GG*TEST*456:EHE
GS*125GG*TEST*456:EHE
ST*ERQ*RFR*EDRR*EER
ST*ERW*RFR*EDRR*EER
ST*ERR*RFR*EDRR*EER
I am trying to find the distinct string of the substring before the second star.
The output would be:
distinct_line_value count
ISA*00 2
GS*123GG 2
ST*ERT 1
ISA*01 1
GS*124GG 1
GS*125GG 1
ST*ERQ 1
ST*ERW 1
ST*ERR 1
Any ideas how I can do it based on distinct for the first 2 stars?
Here's one option:
Test case:
SQL> select * from test;
SEQ LINE
---------- --------------------------------------------------
1 ISA*00*TEST
ISA*00*TEST1
GS*123GG*TEST*456:EHE
ST*ERT*RFR*EDRR*EER
GS*123GG*TEST
2 ISA*01*TEST
GS*124GG*TEST*456:EHE
GS*125GG*TEST*456:EHE
ST*ERQ*RFR*EDRR*EER
ST*E
Query (see comments within the code; apart from that REGEXP_SUBSTR is crucial here, along with its 'm' match parameter which treats the input string as multiple lines):
SQL> with
2 -- split CLOB values to rows
3 inter as
4 (select seq,
5 regexp_substr(line, '^.*$', 1, column_value, 'm') res
6 from test,
7 table(cast(multiset(select level from dual
8 connect by level <= regexp_count(line, chr(10)) + 1
9 ) as sys.odcinumberlist))
10 ),
11 -- convert CLOB to VARCHAR2 (so that SUBSTR works)
12 inter2 as
13 (select to_char(res) res From inter)
14 -- the final result
15 select substr(res, 1, instr(res, '*', 1, 2)) val, count(*)
16 from inter2
17 group by substr(res, 1, instr(res, '*', 1, 2))
18 order by 1;
VAL COUNT(*)
-------------------------------------------------- ----------
GS*123GG* 2
GS*124GG* 1
GS*125GG* 1
ISA*00* 2
ISA*01* 1
ST*ERQ* 1
ST*ERR* 1
ST*ERT* 1
ST*ERW* 1
9 rows selected.
SQL>

Highest per each group

It's hard to show my actual table and data here so I'll describe my problem with a sample table and data:
create table foo(id int,x_part int,y_part int,out_id int,out_idx text);
insert into foo values (1,2,3,55,'BAK'),(2,3,4,77,'ZAK'),(3,4,8,55,'RGT'),(9,10,15,77,'UIT'),
(3,4,8,11,'UTL'),(3,4,8,65,'MAQ'),(3,4,8,77,'YTU');
Following is the table foo:
id x_part y_part out_id out_idx
-- ------ ------ ------ -------
3 4 8 11 UTL
3 4 8 55 RGT
1 2 3 55 BAK
3 4 8 65 MAQ
9 10 15 77 UIT
2 3 4 77 ZAK
3 4 8 77 YTU
I need to select all fields by sorting the highest id of each out_id.
Expected output:
id x_part y_part out_id out_idx
-- ------ ------ ------ -------
3 4 8 11 UTL
3 4 8 55 RGT
3 4 8 65 MAQ
9 10 15 77 UIT
Using PostgreSQL.
Postgres specific (and fastest) solution:
select distinct on (out_id) *
from foo
order by out_id, id desc;
Standard SQL solution using a window function (second fastest)
select id, x_part, y_part, out_id, out_idx
from (
select id, x_part, y_part, out_id, out_idx,
row_number() over (partition by out_id order by id desc) as rn
from foo
) t
where rn = 1
order by id;
Note that both solutions will only return each id once, even if there are multiple out_id values that are the same. If you want them all returned, use dense_rank() instead of row_number()
select *
from foo
where (id,out_id) in (
select max(id),out_id from foo group by out_id
) order by out_id
Finding max(val) := finding the record for which no larger val exists:
SELECT *
FROM foo f
WHERE NOT EXISTS (
SELECT 317
FROM foo nx
WHERE nx.out_id = f.out_id
AND nx.id > f.id
);

Table transformation / field parsing in PL/SQL

I have de-normalized table, something like
CODES
ID | VALUE
10 | A,B,C
11 | A,B
12 | A,B,C,D,E,F
13 | R,T,D,W,W,W,W,W,S,S
The job is to convert is where each token from VALUE will generate new row. Example:
CODES_TRANS
ID | VALUE_TRANS
10 | A
10 | B
10 | C
11 | A
11 | B
What is the best way to do it in PL/SQL without usage of custom pl/sql packages, ideally with pure SQL?
Obvious solution is to implement it via cursors. Any ideas?
Another alternative is to use the model clause:
SQL> select id
2 , value
3 from codes
4 model
5 return updated rows
6 partition by (id)
7 dimension by (-1 i)
8 measures (value)
9 ( value[for i from 0 to length(value[-1])-length(replace(value[-1],',')) increment 1]
10 = regexp_substr(value[-1],'[^,]+',1,cv(i)+1)
11 )
12 order by id
13 , i
14 /
ID VALUE
---------- -------------------
10 A
10 B
10 C
11 A
11 B
12 A
12 B
12 C
12 D
12 E
12 F
13 R
13 T
13 D
13 W
13 W
13 W
13 W
13 W
13 S
13 S
21 rows selected.
I have written up to 6 alternatives for this type of query in this blogpost: http://rwijk.blogspot.com/2007/11/interval-based-row-generation.html
Regards,
Rob.
I have a pure SQL solution for you.
I adapted a trick I found on an old Ask Tom site, posted by Mihail Bratu. My adaptation uses regex to tokenise the VALUE column, so it requires 10g or higher.
The test data.
SQL> select * from t34
2 /
ID VALUE
---------- -------------------------
10 A,B,C
11 A,B
12 A,B,C,D,E,F
13 R,T,D,W1,W2,W3,W4,W5,S,S
SQL>
The query:
SQL> select t34.id
2 , t.column_value value
3 from t34
4 , table(cast(multiset(
5 select regexp_substr (t34.value, '[^(,)]+', 1, level)
6 from dual
7 connect by level <= length(value)
8 ) as sys.dbms_debug_vc2coll )) t
9 where t.column_value != ','
10 /
ID VALUE
---------- -------------------------
10 A
10 B
10 C
11 A
11 B
12 A
12 B
12 C
12 D
12 E
12 F
13 R
13 T
13 D
13 W1
13 W2
13 W3
13 W4
13 W5
13 S
13 S
21 rows selected.
SQL>
Based on Celko's book, here is what I found and it's working well!
SELECT
TABLE1.ID
, MAX(SEQ1.SEQ) AS START_POS
, SEQ2.SEQ AS END_POS
, COUNT(SEQ2.SEQ) AS PLACE
FROM
TABLE1, V_SEQ SEQ1, V_SEQ SEQ2
WHERE
SUBSTR(',' || TABLE1.VALUE || ',', SEQ1.SEQ, 1) = ','
AND SUBSTR(',' || TABLE1.VALUE || ',', SEQ2.SEQ, 1) = ','
AND SEQ1.SEQ < SEQ2.SEQ
AND SEQ2.SEQ <= LENGTH(TABLE1.VALUE)
GROUP BY TABLE1.ID, TABLE1.VALUE, SEQ2.SEQ
Where V_SEQ is a static table with one field:
SEQ, integer values 1 through N, where N >= MAX_LENGTH(VALUE).
This is based on the fact the the VALUE is wrapped by ',' on both ends, like this:
,A,B,C,D,
If your tokens are fixed length (like in my case) I simply used PLACE field to calculate the actual string. If variable length, use start_pos and end_pos
So, in my case, tokens are 2 char long, so the final SQL is:
SELECT
TABLE1.ID
, SUBSTR(TABLE1.VALUE, T_SUB.PLACE * 3 - 2 , 2 ) AS SINGLE_VAL
FROM
(
SELECT
TABLE1.ID
, MAX(SEQ1.SEQ) AS START_POS
, SEQ2.SEQ AS END_POS
, COUNT(SEQ2.SEQ) AS PLACE
FROM
TABLE1, V_SEQ SEQ1, V_SEQ SEQ2
WHERE
SUBSTR(',' || TABLE1.VALUE || ',', SEQ1.SEQ, 1) = ','
AND SUBSTR(',' || TABLE1.VALUE || ',', SEQ2.SEQ, 1) = ','
AND SEQ1.SEQ < SEQ2.SEQ
AND SEQ2.SEQ <= LENGTH(TABLE1.VALUE)
GROUP BY TABLE1.ID, TABLE1.VALUE, SEQ2.SEQ
) T_SUB
INNER JOIN
TABLE1 ON TABLE1.ID = T_SUB.ID
ORDER BY TABLE1.ID, T_SUB.PLACE
Original Answer
In SQL Server TSQL we parse strings and make a table object. Here is sample code - maybe you can translate it.
http://rbgupta.blogspot.com/2007/10/tsql-parsing-delimited-string-into.html
Second Option
Count the number of commas per row. Get the Max number of commas. Let's say that in the entire table you have a row with 5 commas max. Build a SELECT with 5 substrings. This will make it a set based operation and should be much faster than a rbar.