Oracle SQL regexp sum within repeating sequence grouped by - sql

First time here, hope someone can help
What's the Oracle SQL select stmt for the XML below stored in an Oracle CLOB field so the following is returned (there will be as many as 96 intvColl blocks per day).
Basically the numeric values in the intvColl blocks between the 2nd and 3rd commas need to be summed and grouped by the date before the first comma and also by the varchar after the 3rd comma.
I'm guessing regexp_substr / but can't quite get there.
The first record is the sum of the 1st and 2nd intvColl blocks
The second record is the sum of the 3rd intvColl block
The third record is the sum of the 4th and 5th
MeterChannelID Date Sum Quality Count_of_records
6103044759-40011200-Q1 14/03/2016 1,387 A 2
6103044759-40011200-Q1 14/03/2016 694 S 1
6103044759-40011200-Q1 15/03/2016 1,433 A 2
<uploadRegData>
<intervalDataBlock>
<setDateTime>16/03/2016-19:30:01</setDateTime>
<intervalMinute>15</intervalMinute>
<meterChannelID>6103044759-40011200-Q1</meterChannelID>
<intvColl><intvData>14/03/2016,1,700,A</intvData></intvColl>
<intvColl><intvData>14/03/2016,2,687,A</intvData></intvColl>
<intvColl><intvData>14/03/2016,3,694,S</intvData></intvColl>
<intvColl><intvData>15/03/2016,4,724,A</intvData></intvColl>
<intvColl><intvData>15/03/2016,5,709,A</intvData></intvColl>
</intervalDataBlock>
</uploadRegData>

SELECT MeterChannelID,
"Date",
SUM( value ) AS "Sum",
Quality,
COUNT(1) AS Count_of_Records
FROM (
SELECT MeterChannelID,
TO_DATE( SUBSTR( data, 1, 10 ), 'DD/MM/YYYY' ) AS "Date",
TO_NUMBER( SUBSTR(
data,
INSTR( data, ',', 1, 2 ),
LENGTH( data ) - INSTR( data, ',', 1, 2 ) - 2
) ) AS value,
SUBSTR( data, -1 ) AS Quality
FROM (
SELECT EXTRACTVALUE( xml, '/uploadRegData/intervalDataBlock/meterChannelId' )
AS MeterChannelID,
EXTRACTVALUE( d.COLUMN_VALUE, '/intvData' ) AS data
FROM ( SELECT XMLType( column_name ) AS xml FROM table_name ) x,
TABLE(
XMLSequence(
EXTRACT(
x.xml,
'/uploadRegData/intervalDataBlock/intvCol1/intvData'
)
)
) d
)
)
GROUP BY MeterChannelID, "Date", Quality
ORDER BY MeterChannelID, "Date", Quality;

Related

Split string by number of character in Oracle SQL

I am looking to split string at 70 character and I found below code from this link which is slightly different than I am looking. Below code is splitting character at the max of 70 using the last space as break but what I am looking is without the last space as break. I tried to make changes from this code but honestly I am still learning the regexp_substr function and unable to figure out where the last space is considered. It will be very helpful if someone guide me through this. Thanks
Code:
with val as (
select 'I am also looking for similar solution but slightly different where the split occurs at 70 character and not considering last space.' str
from dual
), words as (
select regexp_substr(str, '[^ ]+', 1, level) w
from val
connect by regexp_substr(str, '[^ ]+', 1, level) is not null
), grps as (
select * from words
match_recognize (
measures
match_number () grp,
count ( text.* ) word#
all rows per match
pattern ( init text* )
define
text as sum ( length ( w ) ) + count ( text.* ) <= 70
)
)
select listagg ( w, ' ' )
within group ( order by word# ) split_strs
from grps
group by grp
order by grp;
Result from above code:
What I am looking to achieve is,
You need a Hierarchical query containing SUBSTR() function such as
SELECT level-1 AS "Row_Number",
SUBSTR(str,70*(level-1)+1,70*level)
FROM val
CONNECT BY level <= CEIL(LENGTH(str)/70)
Demo
For exactly 70 chars you can use SUBSTR instead of regexp_substr.
e.g. first 70 chars = SUBSTR(val,1,70)
This one done the trick, thanks those responded before I get to this.
with val as (
select 'I am also looking for similar solution but slightly different where the split occurs at 70 character and not considering last space.' str
from dual
), words as (
select regexp_substr(str, '(.{1})', 1, level) w
from val
connect by regexp_substr(str, '(.{1})', 1, level) is not null
), grps as (
select * from words
match_recognize (
measures
match_number () grp,
count ( text.* ) word#
all rows per match
pattern ( init text* )
define
text as sum ( length ( w ) ) <= 70
)
)
select listagg ( w, '' )
within group ( order by word# ) split_strs
from grps
group by grp
order by grp
;

SQL How to perform multiple look-ups from a list, in one query

We have a weird database table (wt) for which I can construct a query that can return a single row with these fields:
wt.thing_a_id = 5, wt.thing_b_id = 12, wt.thing_c_id = 9
Then, there's another lookup table (dt) that holds descriptions for these numbers, you could imagine it like this:
id desc
5 "flour"
12 "cups"
9 "barley"
what I need to end up with is numbers from wt, along with its description from dt.
I can do 3 simple queries, one to look up each of my three thing_ values (select desc from dt where id = ) but I was hoping to do it all in one query.
Is there a way to do this?
Even better, is there way to do my query to get my single row of thing id's and combine them with their descriptions? I think the fundamental problem/challenge is that my thing id's are not one per row, but that they come back as fields in just one row. This makes it really hard to join against them, for example.
Michael
You seem to want conditional aggregation:
select
max(case when id = 3 then descr end) descr_3,
max(case when id = 12 then descr end) descr_12,
max(case when id = 9 then descr end) descr_9
from dt
where id in (3, 12, 9)
Note that desc is a SQL keyword, hence a poor choice for a column name. I renamed it descr in the query.
You will need multiple joins to the dt table to get the description of each of the "things" you want in a single row:
SELECT thing_a_id, dta.desc AS thing_a_desc,
thing_b_id, dtb.desc AS thing_b_desc,
thing_c_id, dtc.desc AS thing_c_desc
FROM wt
JOIN dt dta ON dta.id = wt.thing_a_id
JOIN dt dtb ON dtb.id = wt.thing_b_id
JOIN dt dtc ON dtc.id = wt.thing_c_id
I love to play with common table expressions (CTE), this is an ideal candidate for one.
In the example below, decriptions and dataset are substitutes for the actual tables you use. I am just building them in memory rather than an actual table.
In the "breakdown" CTE I am splitting up the CSV value from dataset into multiple rows.
In the last part of the select I am converting everything after the = sign to a number, and then matching that on id from the descriptions CTE. The resulting dataset is I believe what you requested.
WITH
descriptions AS
(SELECT 5 AS id, 'flour' AS description FROM DUAL
UNION ALL
SELECT 12 AS id, 'cups' AS description FROM DUAL
UNION ALL
SELECT 9 AS id, 'barley' AS description FROM DUAL),
dataset AS
(SELECT 'wt.thing_a_id = 5, wt.thing_b_id = 12, wt.thing_c_id = 9' AS result FROM DUAL),
breakdown ( result, REMAINDER ) AS
(SELECT TRIM( SUBSTR( result
, 1
, INSTR( result || ',', ',' ) - 1 ) ) AS result
, TRIM( SUBSTR( result, INSTR( result || ',', ',' ) + 1 ) || ',' ) AS REMAINDER
FROM dataset
UNION ALL
SELECT TRIM( SUBSTR( REMAINDER
, 1
, INSTR( REMAINDER, ',' ) - 1 ) )
, SUBSTR( REMAINDER, INSTR( REMAINDER || ',', ',' ) + 1 ) AS REMAINDER
FROM breakdown
WHERE REMAINDER IS NOT NULL)
SELECT result, TO_NUMBER( TRIM( SUBSTR( result, INSTR( result, '=' ) + 1 ) ) ) AS id, description
FROM breakdown
LEFT OUTER JOIN descriptions
ON TO_NUMBER( TRIM( SUBSTR( breakdown.result, INSTR( breakdown.result, '=' ) + 1 ) ) ) =
descriptions.id
Results:
Result ID DESCRIPTION
wt.thing_a_id = 5 5 flour
wt.thing_b_id = 12 12 cups
wt.thing_c_id = 9 9 barley

How to remove duplicated values from attribute

I have a column with duplicated values in single cell, please tell me how can i remove duplicated values using sql or pl/sql only.
| Test
-+--------------------------------------------------------------------
| 999999999(10145) 999999999(10145) 999999999(10145) 999999999(10145)
|--------------------------------------------------------------------
| 113307425(2) 310122174(2) 310122174(2) 113307425(2)
Use a regular expression with a back-reference to match the repeating terms:
Oracle Setup:
CREATE TABLE test_data ( value ) AS
SELECT '9999999(12345) 9999999(12345) 9999999(12345) 9999999(12345)' FROM DUAL;
Query:
SELECT REGEXP_REPLACE( value, '([^ ]+)( \1)+', '\1' ) AS replaced_value
FROM test_data
Output:
| REPLACED_VALUE |
| :------------- |
| 9999999(12345) |
db<>fiddle here
Updated: For new data in the 6th edit:
CREATE TABLE test_data ( value ) AS
SELECT '9999999(12345) 9999999(12345) 9999999(12345) 9999999(12345)' FROM DUAL UNION ALL
SELECT '113307425(2) 310122174(2) 310122174(2) 113307425(2)' FROM DUAL;
Query:
Use a recursive sub-query factoring clause to find the terms in the string and then use DISTINCT to remove the duplicates and the LISTAGG to concatenate them back into a single string.
WITH bounds ( id, value, start_pos, end_pos ) AS (
SELECT ROWID,
value,
1,
INSTR( value, ' ', 1 )
FROM test_data
UNION ALL
SELECT id,
value,
end_pos + 1,
INSTR( value, ' ', end_pos + 1 )
FROM bounds
WHERE end_pos > 0
),
strings ( id, value ) AS (
SELECT DISTINCT
id,
CASE end_pos
WHEN 0
THEN SUBSTR( value, start_pos )
ELSE SUBSTR( value, start_pos, end_pos - start_pos )
END
FROM bounds
)
SELECT LISTAGG( value, ' ' ) WITHIN GROUP ( ORDER BY value ) AS unique_values
FROM strings
GROUP BY id
Output:
| UNIQUE_VALUES |
| :------------------------ |
| 9999999(12345) |
| 113307425(2) 310122174(2) |
db<>fiddle here
Oracle allows for recursive subquery factoring that can be harnessed to apply regexp based substitutions repeatedly:
CREATE TABLE test_data ( value ) AS
SELECT '9999999(12345) 9999999(12345) 9999999(12345) 9999999(12345)' FROM DUAL;
WITH rep(n,s,n_maxrep) AS (
SELECT 1
, value
, 1 + LENGTH(REGEXP_REPLACE(value, '[^ ]', ''))
FROM test_data
UNION ALL
SELECT n+1
, REGEXP_REPLACE ( s, '([^ ]+)(( [^ ]+)*)( \1)+', '\1\2' )
, n_maxrep
FROM rep
WHERE n <= n_maxrep
)
SELECT s FROM rep WHERE n = n_maxrep;
Explanation
The query repeatedly applies the same basic regex-based replacement of a single verb duplicate. to the original column. A 'verb' in this context is the maximal sequence of consecutive non-space chars. The duplicates may be next to each other or be separated by other verbs.
The maximum possible number of such replacements is known beforehand: n-1 for n verbs, when all verbs are identical. This is equivalent to the number of occurrences of the separating character in the original value.
Everything else is syntactic sugar. Oracle builds the nested chain of subqueries on its own.
Note that the limit n_maxrep is actually 1 + <number_of_separator_occurrences>. This is necessary as the base case ( n=1 ) does no replacement.

Regexp_replace processing result

I have a string with groups of nubmers. And Id like to make constant length string. Now I use two regexp_replace. First to add 10 numbers to string and next to cut string and take last 10 values:
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace(
regexp_replace(txt, '(\d+)','0000000000\1')
,'\d+(\d{10})','\1') from s ;
But Id like to use only one regex something like
regexp_replace(txt, '(\d+)',lpad('\1',10,'0'))
But it don't work. lpad executed before regexp. Could you have any ideas?
With a slightly different approach, you can try the following:
with s(id, txt) as
(
select rownum, txt
from (
select '1030123:12031:1341' as txt from dual union all
select '1234:0123456789:1341' from dual
)
)
SELECT listagg(lpad(regexp_substr(s.txt, '[^:]+', 1, lines.column_value), 10, '0'), ':') within group (order by column_value) txt
FROM s,
TABLE (CAST (MULTISET
(SELECT LEVEL FROM dual CONNECT BY instr(s.txt, ':', 1, LEVEL - 1) > 0
) AS sys.odciNumberList )) lines
group by id
TXT
-----------------------------------
0001030123:0000012031:0000001341
0000001234:0123456789:0000001341
This uses the CONNECT BY to split every string based on the separator ':', then uses LPAD to pad to 10 and then aggregates the strings to build rows containing the concatenation of padded values
This works for non-empty sequences (e.g. 123::456)
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace (regexp_replace (txt,'(\d+)',lpad('0',10,'0') || '\1'),'0*(\d{10})','\1')
from s
;

Sum the numbers in a string in Oracle

Below is the interview question, can some please help me resolve it?
select 'a1b2c3d4e5f6g7' from dual;
Output is sum of given integer number(1+2+3+4+5+6+7)=28.
Any help?
Use a Regex to keep only the numbers,then connect by to add each number
With T
as (select regexp_replace('a1b2c3d4e5f6g7', '[A-Za-z]') as col from dual)
select sum(val)
From
(
select substr(col,level,1) val from t connect by level <= length(col)
)
FIDDLE
Since it is only 1 digit numbers you can use SUBSTR() to extract every other character:
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
WITH data ( value ) AS (
select 'a1b2c3d4e5f6g7' from dual
)
SELECT SUM( TO_NUMBER( SUBSTR( value, 2*LEVEL, 1 ) ) ) AS total
FROM data
CONNECT BY 2 * LEVEL <= LENGTH( value )
Results:
| TOTAL |
|-------|
| 28 |
However, if you have two digit numbers then you can do:
Query 2:
WITH data ( value ) AS (
select 'a1b2c3d4e5f6g7h8i9j10' from dual
)
SELECT SUM( TO_NUMBER( REGEXP_SUBSTR( value, '\d+', 1, LEVEL ) ) ) AS total
FROM data
CONNECT BY LEVEL <= REGEXP_COUNT( value, '\d+' )
Results:
| TOTAL |
|-------|
| 55 |
You can use regexp_substr to extract exactly the numbers, then just sum them:
with t as (select 'a1b2c3d4e5f6g7' expr from dual)
select sum(regexp_substr(t.expr, '[0-9]+',1, level)) as col
from dual
connect by level < regexp_instr(t.expr, '[0-9]+',1, level);
example:
select sum(regexp_substr('a1b2c3d4e5f6g7r22g4', '[0-9]+',1, level)) as col
from dual
connect by level < regexp_instr('a1b2c3d4e5f6g7r22g4', '[0-9]+',1, level);
Result:
54
This solution works with numbers with more than 1 digit and it doesn't matter how many characters are between the numbers:
with t as (select 'a1b2c3d4e5f6g7' as str from dual)
select sum(to_number(regexp_substr(str,'[0-9]+',1,level)))
from t
connect by regexp_substr(str,'[0-9]+',1,level) is not null