REGEXP_SUBSTR : extracting portion of string between [ ] including []

REGEXP_SUBSTR : extracting portion of string between [ ] including [] - sql

I am on Oracle 11gR2.
I am trying to extract the text between '[' and ']' including [].
ex:
select regexp_substr('select userid,username from tablename where user_id=[REQ.UID] and username=[REQD.VP.UNAME]','\[(.*)\]') from dual
Output:
[REQ.UID] and username=[REQD.VP.UNAME]
Output needed:
[REQ.UID][REQD.VP.UNAME]
Please let me know how to get the needed output.
Thanks & Regards,
Bishal

Assuming you are just going to have two occurrences of [] then the following should suffice. The ? in the .*? means that it is non-greedy so that it doesn't gobble up the last ].
select
regexp_replace('select userid,username from tablename where user_id=[REQ.UID] and username=[REQD.VP.UNAME]'
,'.*(\[.*?\]).*(\[.*?\]).*','\1\2')
from dual
;

I'm not an Oracle user, but from quick perusal of the docs, I think this should be close:
REGEXP_REPLACE('select userid,username from tablename where user_id=[REQ.UID] and username=[REQD.VP.UNAME]',
'^[^\[]*(\[[^\]]*\])[^\[]*(\[[^\]]*\])$', '\1 \2')
Which looks much nastier than it is.
Pattern is:
^[^\[]* Capture all characters up to (but not including) the first [
(\[[^\]]*\]) Capture into group 1 anything like [<not "]">]
[^\[]* Capture everything up to (nut not including) the next [
(\[[^\]]*\]) Capture into group 2 anything like [<not "]">], at the end of the string
Then the replacement is simple, just <grp 1> <grp 2>

Related

Parse string as JSON with Snowflake SQL

I have a field in a table of our db that works like an event-like payload, where all changes to different entities are gathered. See example below for a single field of the object:
'---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: this_utc\nanother_date: 2022-11-30\nutc: another_utc'
Since accessing this field with pure SQL is a pain, I was thinking of parsing it as a JSON so that it would look like this:
{
"field_one":"1",
"field_two": "20",
"field_three": "4",
"id": "1234",
"another_id": "5678",
"some_text": "Hey you",
"a_date": "2022-11-29",
"utc": "2022-11-29 15:29:28.159296000 Z",
"another_date": "2022-11-30",
"utc": "2022-11-30 13:34:59.000000000 Z"
}
And then just use a Snowflake-native approach to access the values I need.
As you can see, though, there are two fields that are called utc, since one is referring to the first date (a_date), and the second one is referring to the second date (another_date). I believe these are nested in the object, but it's difficult to assess with the format of the field.
This is a problem since I can't differentiate between one utc and another when giving the string the format I need and running a parse_json() function (due to both keys using the same name).
My SQL so far looks like the following:
select
object,
replace(object, '---\n', '{"') || '"}' as first,
replace(first, '\n', '","') as second_,
replace(second_, ': ', '":"') as third,
replace(third, ' ', '') as fourth,
replace(fourth, ' ', '') as last
from my_table
(Steps third and fourth are needed because I have some fields that have extra spaces in them)
And this actually gives me the format I need, but due to what I mentioned around the utc keys, I cannot parse the string as a JSON.
Also note that the structure of the string might change from row to row, meaning that some rows might gather two utc keys, while others might have one, and others even five.
Any ideas on how to overcome that?

Replace only one occurrence with regexp_replace():
with data as (
select '---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: this_utc\nanother_date: 2022-11-30\nutc: another_utc' o
)
select parse_json(last2)
from (
select o,
replace(o, '---\n', '{"') || '"}' as first,
replace(first, '\n', '","') as second_,
replace(second_, ': ', '":"') as third,
replace(third, ' ', '') as fourth,
replace(fourth, ' ', '') as last,
regexp_replace(last, '"utc"', '"utc2"', 1, 2) last2
from data
)
;

This may not be what you want but it seems to me that your problem could be solved if the UTC timestamps were to replace the dates preceding it where the keys are not duplicated. You can always calculate dates once you have the timestamps. If this is making sense, see if you can apply your parse_json solution to this output instead
set str='---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: 2022-11-29 15:29:28.159296000 Z\nanother_date: 2022-11-30\nutc: 2022-11-30 13:34:59.000000000 Z';
select regexp_replace($str,'[0-9]{4}-[0-9]{2}-[0-9]{2}\nutc:')

Pattern match using regexp_extract_all

I am trying to build a array from this string and need help with pattern on regexp_extract_all.
Here is my input string contains keyword value pairs
BEGIN
DECLARE p_JSON STRING DEFAULT """
{
"instances": [{
"LT_20MN_SalesContrctCnt": 388.0,
"Pyramid_Index": '',
"MARKET": "'Growth Markets','Europe'",
"SERVICE_DIM": "'S&C','F&M'",
"SG_MD": "'All Service Group'"
}]}
""";
SELECT split(x,":")[OFFSET(0)] as keyword, split(x,":")[OFFSET(1)] keyword_value
FROM unnest(split(REGEXP_REPLACE(JSON_EXTRACT(p_JSON, '$.instances'),r'([\'\"\[\]{}])', ''))) as x
END;
The above SQL is failing at SPLIT due to , with in the data.
All I am trying to do here is build a two columns Keyword and value.
The idea here is if I can extract each row using REGEXP_EXTRACT_ALL with out the last "," then I should be able to split into keyword and keyword_value columns. Btw the names or number of keywords/values are not fixed.
Intended output from REGEXP_EXTRACT_ALL:
"LT_20MN_SalesContrctCnt": 388.0
"Pyramid_Index": ''
"MARKET": "'Growth Markets','Europe'"
"SERVICE_DIM": "'S&C','F&M'"
"SG_MD": "'All Service Group'"
Appreciate if you can suggest a better way to handle this.
Thanks in advance.

Using your sample data, I just added an extra REGEXP_REPLACE to replace ," to #" so we can avoid splitting using ,. See approach below:
SELECT
SPLIT(arr,":")[OFFSET(0)] as keyword,
SPLIT(arr,":")[OFFSET(1)] as keyword_value,
FROM sample_data,
UNNEST(SPLIT(REGEXP_REPLACE(REGEXP_REPLACE(JSON_EXTRACT(p_JSON, '$.instances'),r'[\[\]{}]',''),r',"','#"'),'#')) arr
Output:

Can we Use "Case" in a ColdFusion Query-of-Query

I am applying case in ColdFusion query of query but it's throwing an error.
Query:
<cfquery name="qEmployees1" dbtype="query">
select (
case
when ISNUMERIC(u.userdefined)=1
then right('00000000'+u.userdefined,8)
else userdefined
end
) as hello
from all_employees
order by hello ASC
</cfquery>
Error message:
Encountered "when" at line 3, column 22. Was expecting one of:
"AND" ... "BETWEEN" ... "IN" ... "IS" ... "LIKE" ... "NOT" ...
"OR" ... ")" ... "=" ... "." ... "!=" ... "#" ... "<>" ...
">" ... ">=" ... "<" ... "<=" ... "+" ... "-" ... "*" ...
"||" ... "/" ... "**" ... "(" ...

Update:
The original suggestion isn't going to work due to it only looking at a single row. Really you need to loop through your all_employees recordset and apply it to each individual row.
You might be able to achieve this without QoQ if you are just outputting the results to the page. Like this:
<cfoutput>
<cfloop query="all_employees">
<cfif isNumeric(all_employees.userdefined)>
#Right('00000000'&all_employees.userdefined,8)#
<cfelse>
#all_employees.userdefined#
<cfif>
</cfloop>
</cfoutput>
Original Answer:
How about something like this?:
<cfquery name="qEmployees1" dbtype="query">
SELECT
<cfif isNumeric([all_employees].[u.userdefined])>
right('00000000'+u.userdefined,8)
<cfelse>
u.userdefined
</cfif> AS hello
FROM all_employees
ORDER by hello
</cfquery>
I have not tested this but I don't think having dot notation in the SQL column name will work correctly in this case. I enclosed it in square brackets anyway.

In case anyone else decides to try the QoQ below, one very important thing to note is that even if it executes without error, it's NOT doing the same thing as CASE. A CASE statement applies logic to the values within each row of a table - individually. In the QoQ version, the CFIF expression does not operate on all values within the query. It only examines the value in the 1st row and then applies the decision for that one value to ALL rows in the query.
Notice how the QoQ below (incorrectly) reports that all of the values are numeric? While the database query (correctly) reports a mix of "Numeric" and "Non-numeric" values. So the QoQ code is not equivalent to CASE.
TestTable Data:
id userDefined
1 22
2 AA
3 BB
4 CC
Database Query:
SELECT CASE
WHEN ISNUMERIC(userDefined)=1 THEN 'Number: '+ userDefined
ELSE 'Not a number: ' + userDefined
END AS TheColumnAlias
FROM TestTable
ORDER BY ID ASC
Database Query Result:
QoQ
<cfquery name="qQueryOfQuery" dbtype="query">
SELECT
<cfif isNumeric(qDatabaseQuery2.userDefined)>
'Number: '+ userDefined
<cfelse>
'Not a number: ' + userDefined
</cfif>
AS TheColumnAlias
FROM qDatabaseQuery2
ORDER by ID
</cfquery>
QoQ Result

EDIT:
I thought about this one and decided to change it to an actual answer. Since you're using CF2016+, you have access to some of the more modern features that CF offers. First, Query of Query is a great tool, but it can be very slow. Especially for lower record counts. And then if there are a lot of records in your base query, it can eat up your server's memory, since it's an in-memory operation. We can accomplish our goal without the need of a QoQ.
One way we can sort of duplicate the functionality that you're looking for is with some of the newer CF functions. filter, each and sort all work on a query object. These are the member function versions of these, but I think they look cleaner. Plus, I've used cfscript-syntax.
I mostly reused my original CFSCript query (all_employees), that creates the query object, but I added an f column to it, which holds the text to be filtered on.
all_employees = QueryNew("userdefined,hello,f", "varchar,varchar,varchar",
[
["test","pure text","takeMe"],
["2","number as varchar","takeMe"],
["03","leading zero","takeMe"],
[" 4 ","leading and trailing spaces","takeMe"],
["5 ","extra trailing spaces","takeMe"],
[" 6","extra leading spaces","takeMe"],
["aasdfadsf","adsfasdfasd","dontTakeMe"],
["165e73","scientific notation","takeMe"],
["1.5","decimal","takeMe"],
["1,5","comma-delimited (or non-US decimal)","takeMe"],
["1.0","valid decimal","takeMe"],
["1.","invalid decimal","takeMe"],
["1,000","number with comma","takeMe"]
]
) ;
The original base query didn't have a WHERE clause, so no additional filtering was being done on the initial results. But if we needed to, we could duplicate that with QueryFilter or .filter.
filt = all_employees.filter( function(whereclause){ return ( whereclause.f == "takeMe"); } ) ;
This takes the all_employees query and applies a function that will only return rows that match our function requirements. So any row of the query where f == "takeMe". That's like WHERE f = 'takeMe' in a query. That sets the new filtered results into a new query object filt.
Then we can use QueryEach or .each to go through every row of our new filtered query to modify what we need to. In this case, we're building a new array for the values we want. A for/in loop would probably be faster; I haven't tested.
filt.each(
function(r) {
retval.append(
ISNUMERIC(r.userDefined) ? right("00000000"&ltrim(rtrim((r.userdefined))),8) : r.userDefined
) ;
}
) ;
Now that we have a new array with the results we want, the original QoQ wanted to order those results. We can do this with ArraySort or .sort.
retval.sort("textnocase") ;
In my test, CF2016 seemed to pass retval.sort() as a boolean and didn't return the sorted array, but CF2018 did. This was expected behavior, since the return type was changed in CF2018. Regardless, both will sort the retval array, so that when we dump the retval array, it's in the chosen order.
And as I always suggest, load test on your system with your data. Like I said, this is only one way to go about what you're trying to do. There are others that may be faster.
https://cffiddle.org/app/file?filepath=dedd219b-6b27-451d-972a-7af75c25d897/54e5559a-b42e-4bf6-b19b-075bfd17bde2/67c0856d-bdb3-4c92-82ea-840e6b8b0214.cfm
(CF2018) > https://trycf.com/gist/2a3762dabf10ad695a925d2bc8e55b09/acf2018?theme=monokai
https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-functions/functions-m-r/queryfilter.html
https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-functions/functions-m-r/queryeach.html
https://helpx.adobe.com/coldfusion/cfml-reference/coldfusion-functions/functions-a-b/arraysort.html
ORIGINAL:
This is more of a comment than an answer, but it's much too long for a comment.
I wanted to mention a couple of things to watch out for.
First, ColdFusion's isNumeric() can sometimes have unexpected results. It doesn't really check to see if a value is a number. It checks if a string can be converted to number. So there are all sorts of values that isNumeric() will see as numeric. EX: 1e3 is scientific notation for 1000. isNumeric("1e3") will return true.
My second suggestion is how to deal with leading and trailing space in a "numeric" value, EX: " 4 ". isNumeric() will return true for this one, but when you append and trim for your final value, it will come out as "000000 4". My suggestion to deal with these is to use val() or ltrim(rtrim()) around your column. val() will reduce it to a basic number (" 1.0 " >> "1") but ltrim(rtrim()) will retain the number but get rid of the space (" 1.0 " >> "1.0") and also retain the "scientific notation" value (" 1e3 " >> "1e3"). Both still miss 1,000, so if that's a concern you'll need to handle that. But the method you use totally depends on the values your data contains. Number verification isn't always as easy as it seems it should be.
I've always been a firm believer in GIGO -- Garbage In, Garbage Out. I see basic data cleansing as part of my job. But if it's extreme or regular, I'll tell the source to fix it or their stuff won't work right. When it comes to data, it's impossible to account for all possibilities, but we can check for common expectations. It's always easier to whitelist than it is to blacklist.
<cfscript>
all_employees = QueryNew("userdefined,hello", "varchar,varchar",
[
["test","pure text"],
["2","number as varchar"],
["03","leading zero"],
[" 4 ","leading and trailing spaces"],
["5 ","extra trailing spaces"],
[" 6","extra leading spaces"],
["165e73","scientific notation"],
["1.5","decimal"],
["1,5","comma-delimited (or non-US decimal)"],
["1.0","valid decimal"],
["1.","invalid decimal"],
["1,000","number with comma"]
]
)
//writedump(all_employees) ;
retval = [] ;
for (r in all_employees) {
retval.append(
{
"1 - RowInput" : r.userdefined.replace(" ","*","all") , // Replace space with * for output visibility.
"2 - IsNumeric?" : ISNUMERIC(r.userdefined) ,
"3 - FirstOutput": ( ISNUMERIC(r.userDefined) ? right("00000000"&r.userdefined,8) : r.userDefined ) ,
"4 - ValOutput" : ( ISNUMERIC(r.userDefined) ? right("00000000"&val(r.userdefined),8) : r.userDefined ) ,
"5 - TrimOutput" : ( ISNUMERIC(r.userDefined) ? right("00000000"&ltrim(rtrim((r.userdefined))),8) : r.userDefined )
}
) ;
}
writeDump(retval) ;
</cfscript>
https://trycf.com/gist/03164081321977462f8e9e4916476ed3/acf2018?theme=monokai

What are you trying to do exactly? Please share some context of the goal for your post.
To me it looks like your query may not be formatted properly. It would evalusate to something like:
select ( 0000000099
) as hello
from all_employees
order by hello ASC
Try doing this. Put a <cfabort> right here... And then let me know what query was produced on the screen when you run it.
<cfquery name="qEmployees1" dbtype="query">
select (
case
when ISNUMERIC(u.userdefined)=1
then right('00000000'+u.userdefined,8)
else userdefined
end
) as hello
from all_employees
order by hello ASC
<cfabort>
</cfquery>

<cfquery name="qEmployees1" dbtype="query">
SELECT
(
<cfif isNumeric(all_employees.userdefined)>
right('00000000'+all_employees.userdefined,8)
<cfelse>
all_employees.userdefined
</cfif>
) AS hello
FROM all_employees
ORDER by hello
</cfquery>
it is the syntax free answer thanks to #volumeone

Check constraint for Emails in an Oracle Database

I've searched everywhere for a decent and logical CHECK constraint to validate that an email is in the right format. So far I've found really long and unnecessary expressions like:
create table t (
email varchar2(320) check (
regexp_like(email, '[[:alnum:]]+#[[:alnum:]]+\.[[:alnum:]]')
)
);
and
create table stk_t (
email varchar2(320) check (
email LIKE '%#%.%' AND email NOT LIKE '#%' AND email NOT LIKE '%#%#%'
)
);
Surely there is a simpler way?
I'm using Oracle 11g database and SQL Developer IDE.
This is what I have:
constraint Emails_Check check (Emails LIKE '%_#%_._%')
Can someone please let me know if this is the most efficient way of validating emails?

You can try this
email varchar2(255) check (
email LIKE '%#%.%' AND email NOT LIKE '#%' AND email NOT LIKE '%#%#%' )

CREATE TABLE MYTABLE(
EMAIL VARCHAR2(30) CHECK(REGEXP_LIKE (EMAIL,'^[A-Za-z]+[A-Za-z0-9.]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$'))
)
Explanation of Regular Expression
^ #start of the line
[_A-Za-z0-9-]+ # must start with string in the bracket [ ], must contains one or more (+)
( # start of group #1
\\.[_A-Za-z0-9-]+ # follow by a dot "." and string in the bracket [ ], must contains one or more (+)
)* # end of group #1, this group is optional (*)
# # must contains a "#" symbol
[A-Za-z0-9]+ # follow by string in the bracket [ ], must contains one or more (+)
( # start of group #2 - first level TLD checking
\\.[A-Za-z0-9]+ # follow by a dot "." and string in the bracket [ ], must contains one or more (+)
)* # end of group #2, this group is optional (*)
( # start of group #3 - second level TLD checking
\\.[A-Za-z]{2,} # follow by a dot "." and string in the bracket [ ], with minimum length of 2
) # end of group #3
$ #end of the line

Stumbled upon this answer while hunting for a simple solution on the internet:
ALTER TABLE YourTableName
ADD CONSTRAINT YourConstraintName CHECK(YourColumnName LIKE '%___#___%.__%')
All points to #bhanu_nz here

Postgresql regexp_replace 'g' flag

i have string
[[good|12345]] [[bad1 [[bad2 [[bad3 [[bad4 [[bad5 [[good|12345]]
i need to kill [[ if word havent | after it.
what i do:
select regexp_replace('[[good|12345]] [[bad1 [[bad2 [[bad3 [[bad4 [[bad5 [[good|12345]]',
'\[\[([^\|]+?(\[\[|\Z))', '\1', 'g')
what i get:
[[good|12345]] bad1 [[bad2 bad3 [[bad4 bad5 [[good|12345]]
what i want to get:
[[good|12345]] bad1 bad2 bad3 bad4 bad5 [[good|12345]]
it looks like the last 2 symbols of my regexp [[ doesn't exists in next iteration of regexp

You should use a look-ahead instead of a group:
select regexp_replace('[[good|12345]] [[bad1 [[bad2 [[bad3 [[bad4 [[bad5 [[good|12345]]', '\[\[([^\|]+?(?=\[\[|\Z))', '\1', 'g')
See demo SQL fiddle
The (?=\[\[|\Z) look-ahead only checks the presence of [[, but does not consume the characters (i.e. matches and moves on through the string). Thus, the following [[ remain available for the next match.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

REGEXP_SUBSTR : extracting portion of string between [ ] including [] - sql

Related

Parse string as JSON with Snowflake SQL

Pattern match using regexp_extract_all

Can we Use "Case" in a ColdFusion Query-of-Query

Check constraint for Emails in an Oracle Database

Postgresql regexp_replace 'g' flag

Categories

Resources