extract sub string from string in oracle - sql

I have one of the following string :
(,QUESTION-3914~Please enter the unique identification number associated with the IRQ.|3~Greater Than|5~5,AND,QUESTION-3920~Select the contract action that applies to this IRQ.|5~Equal To|LOV1274~New Agreement,),AND,QUESTION-3921~If "New Agreement#comma#" which type of New Agreement is being requested?|5~Equal To|y~Yes,OR,NOT,(,QUESTION-3923~Will the Third Party Relationship support the implementation#comma# pilot#comma# launch#comma# or operation of a New Activity (as defined in the New Activity Risk Management Policy)?|5~Equal To|y~Yes,)
I want required ouput for this string is like :-
(,QUESTION-3914|3|5,AND,QUESTION-3920|5|LOV1274,),AND,QUESTION-3921|5|y,OR,NOT,(,QUESTION-3923)|5|y,)
What should i do for this ?

Use a regular expression to replace everything from each ~ tilde character until the next comma , or pipe | character (exclusive of that final character):
Oracle Setup:
CREATE TABLE your_data ( input_string ) AS
SELECT '(,QUESTION-3914~Please enter the unique identification number associated with the IRQ.|3~Greater Than|5~5,AND,QUESTION-3920~Select the contract action that applies to this IRQ.|5~Equal To|LOV1274~New Agreement,),AND,QUESTION-3921~If "New Agreement#comma#" which type of New Agreement is being requested?|5~Equal To|y~Yes,OR,NOT,(,QUESTION-3923~Will the Third Party Relationship support the implementation#comma# pilot#comma# launch#comma# or operation of a New Activity (as defined in the New Activity Risk Management Policy)?|5~Equal To|y~Yes,)' FROM DUAL
Query:
SELECT REGEXP_REPLACE( input_string, '~.*?([|,])', '\1' ) AS output
FROM your_data d
Output:
| OUTPUT |
| :--------------------------------------------------------------------------------------------------- |
| (,QUESTION-3914|3|5,AND,QUESTION-3920|5|LOV1274,),AND,QUESTION-3921|5|y,OR,NOT,(,QUESTION-3923|5|y,) |
db<>fiddle here

Related

How would I remove specific words or text from KQL query?

I have the following query which provides me with all the data I need exported but I would like text '' removed from my final query. How would I achieve this?
| where type == "microsoft.security/assessments"
| project id = tostring(id),
Vulnerabilities = properties.metadata.description,
Severity = properties.metadata.severity,
Remediations = properties.metadata.remediationDescription
| parse kind=regex id with '/virtualMachines/' Name '/providers/'
| where isnotempty(Name)
| project Name, Severity, Vulnerabilities, Remediations ```
You could use replace_string() (https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/replace-string-function) to replace any substring with an empty string

sap hana placeholders pass * parameter with arrow notation

Trying to pass a star (*) in a sql Hana place holder with an arrow notation
The following works OK:
Select * FROM "table_1"
( PLACEHOLDER."$$IP_ShipmentStartDate$$" => '2020-01-01',
PLACEHOLDER."$$IP_ShipmentEndDate$$" => '2030-01-01' )
In the following, when trying to pass a *, i get a syntax error:
Select * FROM "table1"
( PLACEHOLDER."$$IP_ShipmentStartDate$$" => '2020-01-01',
PLACEHOLDER.'$$IP_ItemTypecd$$' => '''*''',
PLACEHOLDER."$$IP_ShipmentEndDate$$" => '2030-01-01' )
The reason i am using the arrow notation, is since its the only way i know that allows passing parameters as in the example bellow: (as in linked post)
do begin
declare lv_param nvarchar(100);
select max('some_date')
into lv_param
from dummy /*your_table*/;
select * from "_SYS_BIC"."path.to.your.view/CV_TEST" (
PLACEHOLDER."$$P_DUMMY$$" => :lv_param
);
end;
There's a typo in your code. You need to use double quotes around parameter name, but you have a single quote. It should be: PLACEHOLDER."$$IP_ItemTypecd$$".
When you pass something to Calculation View's parameter, you already have a string, that will be treated as string and have quotes around it where they needed, so no need to add more. But if you really need to pass some quotes inside the placeholder's value you also need to escape them with backslash complementary to doubling them (it was found by doing data preview on calculation view and entering '*' as a value of input parameter, then you'll find valid SQL statement in the log of preview):
do
begin
select *
from "_SYS_BIC"."ztest/CV_TEST_PERF"(
PLACEHOLDER."$$P_DUMMY$$" => '''*'''
);
end;
/*
SAP DBTech JDBC: [339]: invalid number: : line 3 col 3 (at pos 13): invalid number:
not a valid number string '' at function __typecast__()
*/
/*And in trace there's no more information, but interesting part
is preparation step, not an execution
w SQLScriptExecuto se_eapi_proxy.cc(00145) : Error <exception 71000339:
not a valid number string '' at function __typecast__()
> in preparation of internal statement:
*/
do
begin
select *
from "_SYS_BIC"."ztest/CV_TEST_PERF"(
PLACEHOLDER."$$P_DUMMY$$" => '\'*\''
);
end;
/*
SAP DBTech JDBC: [257]: sql syntax error: incorrect syntax near "\": line 5 col 38 (at pos 121)
*/
But this is ok:
do
begin
select *
from "_SYS_BIC"."ztest/CV_TEST_PERF"(
PLACEHOLDER."$$P_DUMMY$$" => '\''*\'''
);
end;
LOG_ID | DATUM | INPUT_PARAM | CUR_DATE
--------------------------+----------+-------------+---------
8IPYSJ23JLVZATTQYYBUYMZ9V | 20201224 | '*' | 20201224
3APKAAC9OGGM2T78TO3WUUBYR | 20201224 | '*' | 20201224
F0QVK7BVUU5IQJRI2Q9QLY0WJ | 20201224 | '*' | 20201224
CW8ISV4YIAS8CEIY8SNMYMSYB | 20201224 | '*' | 20201224
What about the star itself:
As #LarsBr already said, in SQL you need to use LIKE '%pattern%' to search for strings contains parretn in the middle, % is equivalent for ABAP's * (but as far as I know * is more verbose placeholder in non-SQL world). So there's no out-of-the-box conversion of FIELD = '*' to FIELD like '%' or something similar.
But there's no LIKE predicate in Column Engine (in filter or in calculated column).
If you really need LIKE functionality in filter or calculated column, you can:
Switch execution engine to SQL
Or use match(arg, pattern) function of Column Engine, which now dissapeared from the pallete and is hidden quite well in the documentation (here, at the very end of the page, after digging into the description field of the last row in the table, you'll find the actual syntax for it. Damn!).
But here you'll meet another surprise: as long as Column Engine has different operators than SQL (it is more internal and more close to the DB core), it uses star (*) for wildcard character. So for match(string, pattern) you need to use a star again: match('pat string tern', 'pat*tern').
After all the above said: there are cases where you can really want to search for data with wildcards and pass them as parameter. But then you need to use match and pass the parameter as plain text without tricks on star (*) or something (if you want to use officially supported features, not trying to exploit some internals).
After adding this filter to RSPCLOGCHAIN projection node of my CV from the previous thread, it works this way:
do
begin
select *
from "_SYS_BIC"."ztest/CV_TEST_PERF"(
PLACEHOLDER."$$P_DUMMY$$" => 'CW*'
);
end;
LOG_ID | DATUM | INPUT_PARAM | CUR_DATE
--------------------------+----------+-------------+---------
CW8ISV4YIAS8CEIY8SNMYMSYB | 20201224 | CW* | 20201224
do
begin
select *
from "_SYS_BIC"."ztest/CV_TEST_PERF"(
PLACEHOLDER."$$P_DUMMY$$" => 'CW'
);
end;
/*
Fetched 0 row(s) in 0 ms 0 µs (server processing time: 0 ms 0 µs)
*/
The notation with triple quotation marks '''*''' is likely what yields the syntax error here.
Instead, use single quotation marks to provide the '*' string.
But that is just half of the challenge here.
In SQL, the placeholder search is done via LIKE and the placeholder character is %, not *.
To mimic the ABAP behaviour when using calculation views, the input parameters must be used in filter expressions in the calculation view. And these filter expressions have to check for whether the input parameter value is * or not. If it is * then the filter condition needs to be a LIKE, otherwise an = (equal) condition.
A final comment: the PLACEHOLDER-syntax really only works with calculation views and not with tables.

How to store an array of date ranges in Postgres?

I am trying to build a schedule, I generate an array of objects on the client containing date ranges
[
{start: "2020-07-06 0:0", end: "2020-07-10 23:59"},
{start: "2020-07-13 0:0", end: "2020-07-17 23:59"}
]
I have a column of type daterange[] what is the proper way to format this data to insert it into my table?
This is what I have so far:
INSERT INTO schedules(owner, name, dates) VALUES (
1,
'work',
'{
{[2020-07-06 0:0,2020-07-10 23:59]},
{[2020-07-13 0:0,2020-07-17 23:59]}
}'
)
I think you want:
insert into schedules(owner, name, dates) values (
1,
'work',
array[
'[2020-07-06, 2020-07-11)'::daterange,
'[2020-07-13, 2020-07-18)'::daterange
]
);
Rationale:
you are using dateranges, so you cannot have time portions (for this, you would need tsrange instead); as your code stands, it seems like you want an inclusive lower bound and an exclusive upper bound (hence [ at the left side, and ) at the right side)
explicit casting is needed so Postgres can recognize the that array elements have the proper datatype (otherwise, they look like text)
then, you can surround the list of ranges with the array[] constructor
Demo on DB Fiddle:
owner | name | dates
----: | :--- | :----------------------------------------------------
1 | work | {"[2020-07-06,2020-07-11)","[2020-07-13,2020-07-18)"}

How do I identify problematic documents in S3 when querying data in Athena?

I have a basic Athena query like this:
SELECT *
FROM my.dataset LIMIT 10
When I try to run it I get an error message like this:
Your query has the following error(s):
HIVE_BAD_DATA: Error parsing field value for field 2: For input string: "32700.000000000004"
How do I identify the S3 document that has the invalid field?
My documents are JSON.
My table looks like this:
CREATE EXTERNAL TABLE my.data (
`id` string,
`timestamp` string,
`profile` struct<
`name`: string,
`score`: int>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'ignore.malformed.json' = 'true'
)
LOCATION 's3://my-bucket-of-data'
TBLPROPERTIES ('has_encrypted_data'='false');
Inconsistent schema
Inconsistent schema is when values in some rows are of different data type. Let's assume that we have two json files
// inside s3://path/to/bad.json
{"name":"1Patrick", "age":35}
{"name":"1Carlos", "age":"eleven"}
{"name":"1Fabiana", "age":22}
// inside s3://path/to/good.json
{"name":"2Patrick", "age":35}
{"name":"2Carlos", "age":11}
{"name":"2Fabiana", "age":22}
Then a simple query SELECT * FROM some_table will fail with
HIVE_BAD_DATA: Error parsing field value 'eleven' for field 1: For input string: "eleven"
However, we can exclude that file within WHERE clause
SELECT
"$PATH" AS "source_s3_file",
*
FROM some_table
WHERE "$PATH" != 's3://path/to/bad.json'
Result:
source_s3_file | name | age
---------------------------------------
s3://path/to/good.json | 1Patrick | 35
s3://path/to/good.json | 1Carlos | 11
s3://path/to/good.json | 1Fabiana | 22
Of course, this is the best case scenario when we know which files are bad. However, you can employ this approach to somewhat manually infer which files are good. You can also use LIKE or regexp_like to walk through multiple files at a time.
SELECT
COUNT(*)
FROM some_table
WHERE regexp_like("$PATH", 's3://path/to/go[a-z]*.json')
-- If this query doesn't fail, that those files are good.
The obvious drawback of such approach is cost to execute query and time spent, especially if it is done file by file.
Malformed records
In the eyes of AWS Athena, good records are those which are formatted as a single JSON per line:
{ "id" : 50, "name":"John" }
{ "id" : 51, "name":"Jane" }
{ "id" : 53, "name":"Jill" }
AWS Athena supports OpenX JSON SerDe library which can be set to evaluate malformed records as NULL by specifying
-- When you create table
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
when you create table. Thus, the following query will reveal files with malformed records:
SELECT
DISTINCT("$PATH")
FROM "some_database"."some_table"
WHERE(
col_1 IS NULL AND
col_2 IS NULL AND
col_3 IS NULL
-- etc
)
Note: you can use only a single col_1 IS NULL if you are 100% sure that it doesn't contain empty fields other then in corrupted rows.
In general, malformed records are not that big of a deal provided that 'ignore.malformed.json' = 'true'. For example the following query will still succeed
For example if a file contains:
{"name": "2Patrick","age": 35,"address": "North Street"}
{
"name": "2Carlos",
"age": 11,
"address": "Flowers Street"
}
{"name": "2Fabiana","age": 22,"address": "Main Street"}
the following query will still succeed
SELECT
"$PATH" AS "source_s3_file",
*
FROM some_table
Result:
source_s3_file | name | age | address
-----------------------------|----------|-----|-------------
1 s3://path/to/malformed.json| 2Patrick | 35 | North Street
2 s3://path/to/malformed.json| | |
3 s3://path/to/malformed.json| | |
4 s3://path/to/malformed.json| | |
5 s3://path/to/malformed.json| | |
6 s3://path/to/malformed.json| | |
7 s3://path/to/malformed.json| 2Fabiana | 22 | Main Street
While with 'ignore.malformed.json' = 'false' (which is the default behaviour) exactly the same query will throw an error
HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2 [character 3 line 1]

Fetch email address from NVARCHAR2 DATATYPE

I have a table in which there is a column of NVARCHAR2 datatype which holds a string.
The string contains some Email Ids which I require to fetch in a comma separated manner.
Below is the test data --
create table nvarchar2_email (email_reject nvarchar2(1000));
insert into nvarchar2_email values ('com.wm.app.b2b.server.ServiceException: javax.mail.SendFailedException: Invalid Addresses; nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <manoj.dalai#gmail.com>: Recipient address rejected: User unknown in virtual alias table;
nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <santoshi.k#gmail.com>: Recipient address rejected: User unknown in virtual alias table
nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <biswajit-kumar.p#gmail.com>: Recipient address rejected: User unknown in virtual alias table');
insert into nvarchar2_email values ('com.wm.app.b2b.server.ServiceException: javax.mail.SendFailedException: Invalid Addresses; nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <manoj.dalai#gmail.com>: Recipient address rejected: User unknown in virtual alias table;
nested exception is:
com.sun.mail.smtp.SMTPAddressFailedException: 550 5.1.1 <santoshi.k#gmail.com>: Recipient address rejected: User unknown in virtual alias table');
I am trying to use the below SQL but it is repeating the Email Ids !!
select email_rejetc, listagg(REGEXP_substr (email_rejetc,'[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}', 1,level), ',') within group (order by email_rejetc) invalid_email
from nvarchar2_email
connect by level <= REGEXP_count (email_rejetc,'[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}')
group by EMAIL_REJETC
Here the required output is like
manoj.dalai#gmail.com,santosh.k#gmail.com,biswajit-kumar#gmail.com
Number of emails can VARY in different rows of the table;
My DB is :
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
select (select listagg (regexp_substr(cast(e.email_reject as varchar2(1000)),'<(.*?#.*?)>',1,level,'',1),',')
within group (order by e.email_reject)
from dual
connect by level <= regexp_count (e.email_reject,'<.*?#.*?>')
) as emails
from nvarchar2_email e
;
P.s.
There seem to be an issue with regexp_substr and nvarchar that causes each character in the result to be preceded by \0.
Tested on Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
According to your example, it would appear that the e-mail address is always presented as <aaaa#bbbb>, meaning a <, a string with a # in the middle, and a > sign.
You could try something like this (cannot check syntax, so you might need to do some tests):
SUBSTR(<input string> ,
INSTR(<input string>,'<') + 1 ,
(INSTR(<input string>,'>') - INSTR(<input string>,'<') - 2
) ;
This will yield the FIRST e-mail address within the string. You may use the same concept (providing a string without the first section that contains the first e-mail address) in a loop to extract additional addresses within the same string.
I can't see a way to do this through a single "SELECT" statement because each string may have several (and not all string the same number of) addresses.
One option to investigate is to implement a recursive select (Oracle supports this), but it will be much more complex.
Personally, I would go with the approach suggested above.