SQL - Grabbing only a portion of the message in a each row - sql

I have a column name "value" in table T with a long description of errors, it has here is an example of few
but it is also grabbing other rows which i don't need.
Please help?

This answers the original version of the question.
To filter the rows, use regexp_like(). I would suggest:
select t.*
from t
where regexp_like(value, '^An image has error at (1203|12345):')
I am guessing that the final colon is important for the matching.

Why can't you use the LIKE operator?
SELECT t.id, t.value, SUBSTR(t.value, 1, INSTR(t.value, ':')) short_value
FROM t
WHERE value LIKE 'An image has error at 1203:%'
OR value LIKE 'An image has error at 12345:%';

Perhaps your best option seems is a combination of a standard substr+instr to extract the desired value with a regexp_like to determine overall string t desirability overall string.
select substr(value, 1, instr(value, ':')-1 ) value
from d
where regexp_like (value,'An image has error at \d+:');
Although depending the exact requirement for leading test requirement and following numeric value perhaps just
select substr(value, 1, instr(value, ':')-1 ) value
from d
where instr(value, ':') > 1;
Finally you can stay with regexp_substr if you wish. However, Oracle's syntax for that is totally counter intuitive to use of regular expressions:
select value
from (select regexp_substr(value, '(.*):', 1, 1, 'i', 1) value
from d
)
where value is not null;
Demo

Related

Date is not displaying correct with substr & like query

I am trying to get this out out,
but i am experiencing that the substr i am using is incorrect ,
For an example , all my columns are displaying
hdfs://asdasda/asdas/fdsfdsfd/received_files/asdasd_20191231_11122333_123456789_CO.dat
some of which has more character so in order for me to get the exact date in the column is inconsistent if i am using subsring
some will return 20191230
but some will return _2020123
How do we tackle this problem ?
i am trying to display only data , this is using sql language or hue ,
when i input my script in ,
select SUBSTR(input_file_name, LENGTH(input_file_name) - 44, 9) from th_ingestion_status limit 100
i feel my script for Like and substr statement is incorrect
I you want the first sequence of 8 digits surrounded by underscores, use regexp_extract():
select regexp_extract(filename, '_([0-9]{8})_', 1)
If you need this after the last /, then:
select regexp_extract(filename, '_([0-9]{8})_[^/]*$', 1)
Please use below query, also please mention the database you are using, so that can provide relevant query
substr(column_name, instr(column_name, '_', 1, 2) +1, 6)
Oracle Test Case:
select 'hdfs://asdasda/asdas/fdsfdsfd/received_files/asdasd_20191231_11122333_123456789_CO.dat', substr('hdfs://asdasda/asdas/fdsfdsfd/received_files/asdasd_20191231_11122333_123456789_CO.dat', instr('hdfs://asdasda/asdas/fdsfdsfd/received_files/asdasd_20191231_11122333_123456789_CO.dat', '_', 1, 2) +1, 6)
from dual;

How to get file name without extension with using Regular Expressions

I have a field with following values, now i want to extract only those rows with "xyz" in the field value mentioned below, can you please help?
Mydata_xyz_aug21
Mydata2_zzz_aug22
Mydata3_xyz_aug33
One more requirement
I want to extract only "aIBM_MyProjectFile" from following string below, can you please help me with this?
finaldata/mydata/aIBM_MyProjectFile.exe.ld
I've tried this but it didn't work.
select
regexp_substr('FinalProject/MyProject/aIBM_MyProjectFile.exe.ld','([^/]*)[\.]') exp
from dual;
To extract substrings between the first pair of underscores, you need to use
regexp_substr('Mydata_xyz_aug21','_([^_]+)_', 1, 1, NULL, 1)
To get the file name without the extension, you need
regexp_substr('FinalProject/MyProject/aIBM_MyProjectFile.exe.ld','.*/([^.]+)', 1, 1, NULL, 1)
Note that each regex contains a capturing group (a pattern inside (...)) and this value is accessed with the last 1 argument to the regexp_substr function.
The _([^_]+)_ pattern finds the first _, then places 1 or more chars other than _ into Group 1 and then matches another _.
The .*/([^.]+) pattern matches the whole text up to the last /, then captures 1 or more chars other than . into Group 1 using ([^.]+).
For the first requirement, it would suffice to use LIKE, as posted in answer above:
SELECT column
FROM table
WHERE column LIKE '%xyz%';
For your second requirement (extraction) you will have to use REGEXP_SUBSTR function:
SELECT REGEXP_SUBSTR ('FinalProject/MyProject/aIBM_MyProjectFile.exe.ld', '.*/([^.]+)', 1, 1, NULL, 1)
FROM DUAL
I hope it helped!
Another way to do this is to skip regexp completely:
WITH
aset AS
(SELECT 'with_extension.txt' txt FROM DUAL
UNION ALL
SELECT 'without_extension' FROM DUAL)
SELECT CASE
WHEN INSTR (txt, '.', -1) > 0
THEN
SUBSTR (txt, 1, INSTR (txt, '.', -1) - 1)
ELSE
txt
END
txt
FROM aset
The result of this is
with_extension
without_extension
A BIG Caveat where the regexp is better:
My method doesn't handle this case correctly:
\this\is.a\test
So after I have gone to all this effort, stay with the regexp solutions. I'll leave this here so that others may learn from it.

Oracle - REGEXP_SUBSTR leading zeroes ignored issue

While execution below query I'm getting "235" instead of expected results "0"
select REGEXP_SUBSTR(000.235||'', '[^.]+', 1, 1) from dual;
Do this instead, and you'll see where the problem comes:
select 000.235||'' from dual
Result:
.235
The regexp picks up the first longest occurrence of non-period, which in this string is "235", so it's working correctly; it's the input value that is broken
Now, if you'd written it like this, it would be fine:
select REGEXP_SUBSTR('000.235', '[^.]+', 1, 1) from dual
So why the odd presentation of the numeric? What does your data in your table look like? This is unlikely to be the actual query you're running - if you need help with the true query, post it up
Oracle trim numeric values, you can fix it by adding ltrim to number:
select REGEXP_SUBSTR(ltrim(' 000.235')||'', '[^.]+', 1, 1) from dual;
result: 000 as expected

Sql Oracle: Regexp_substr

I have the following expression:
15-JUL-16,20-JUL-16,20-JUL-16,30-JUL-16 in one of my columns.
I successfully used SUBSTR(REGEXP_SUBSTR(base.systemdate, '.+,'), 1, 9) to get 15-JUL-16 (expression until first comma) from the expression.
But I can't figure out how to get 30-JUL-16 (the last expression after last comma).
Is there some way to use REGEXP_SUBSTR to get that? And since we are at it.
Is there a neat way to only use REGEXP_SUBSTR to get 15-JUL-16 without comma? Because I am using second SUBSTR to get rid of the comma, so I can get it compatible with data format.
You can use a very similar construct:
SELECT REGEXP_SUBSTR(base.systemdate, '[^,]+$')
Oracle (and regular expressions in general) are "greedy". This means that they take the longest string. If you know the items in the list are all the same length, you could just use:
SELECT SUBSTR( ase.systemdate, -9)
Try this
select dates from
(
SELECT dates,max(id) over (partition by null) lastrec,min(id) over (partition by null) firstrec,id FROM (
with mine as(select '15-JUL-16,20-JUL-16,20-JUL-16,30-JUL-16' hello from dual)
select rownum id,regexp_substr(hello, '[^,]+', 1, level) dates from mine
connect by regexp_substr(hello, '[^,]+', 1, level) is not null)
)
where id=firstrec or id=lastrec
this query give you first and last record from comma separated list.

Extract text before third - "Dash" in SQL

Can you please help to get this code for SQL?
I have column name INFO_01 which contain info like:
D10-52247-479-245 HALL SO
and I would like to extract only
D10-52247-479
I want the part of the text before the third "-" dash.
You'll need to get the position of the third dash (using instr) and then use substr to get the necessary part of the string.
with temp as (
select 'D10-52247-479-245 HALL SO' test_string from dual)
select test_string,
instr(test_string,1,3) third_dash,
substr(test_string,1,instr(test_string,1,3)-1) result
from temp
);
Here is a simple statement that should work:
SELECT SUBSTR(column, 1, INSTR(column,'-',1,3) ) FROM table;
Using a combination of SUBSTR and INSTR will return what you want:
SELECT SUBSTR('D10-52247-479-245', 0, INSTR('D10-52247-479-245', '-', -1, 1)-1) AS output
FROM DUAL
Result:
output
-------------
D10-52247-479
Use:
SELECT SUBSTR(t.column, 0, INSTR(t.column, '-', -1, 1)-1) AS output
FROM YOUR_TABLE t
Reference:
SUBSTR
INSTR
Addendum
If using Oracle10g+, you can use regex via REGEXP_SUBSTR.
I'm assuming MySQL, let me know if I'm wrong here. But using SUBSTRING_INDEX you could do the following:
SELECT SUBSTRING_INDEX(column, '-', 3)
EDIT
Appears to be oracle. Looks like we may have to resort to REGEXP_SUBSTR
SELECT REGEXP_SUBSTR(column, '^((?.*\-){2}[^\-]*)')
Can't test, so not sure what kind of result that will have...