I have a path like this in my TBL_Documents table:
Uploads/Documents/6093/12/695-Graco-SW_5-15-19.pdf
I need to compare it to a file being uploaded now that will look like this:
695-Graco-SW_5-15-19.pdf
I want to compare the path in my table with the uploaded file name. I tried using substring() on the first right / but I don't really get how substring is really working. For example, I tried to do this:
select substring(right(path,1),1,1) as path from TBL_DOCUMENT
but it is only giving me the very first character from the right. I expected to see everything after the last / character.
How can I do this?
I would use an approach of finding how many characters you need to use from the right. I would do this by first reversing the string and then searching for the '/'. This will tell you how many characters from the right this '/' is. I would then use this in the RIGHT function:
SQL Fiddle
MS SQL Server 2017 Schema Setup:
Query 1:
DECLARE #documentName varchar(100) = 'Uploads/Documents/6093/12/695-Graco-SW_5-15-19.pdf'
SELECT RIGHT(#documentName, CHARINDEX('/',REVERSE(#documentName))-1)
Results:
| |
|--------------------------|
| 695-Graco-SW_5-15-19.pdf |
RIGHT(path,1) means you want [1] character from the right of the path string, or 'f'.
You then wrap 'f' in a substring, asking for [1] character starting at the [1]st position of the string. Since the expression passed to substring returns 'f', your substring also returns 'f'.
You want to use a combination of charindex and reverse to handle this appropriately.
SUBSTRING(path,len(path) - charindex('/',reverse(path))). That will not parse but it should get you on the right track.
In normal speak, this returns the string, starting with the right most '/' of the path, to the end of string.
Related
I am using Snowflake SQL. I would like to remove characters from a string after a special character ~. How can I do that?
here is the whole scenario. Let me explain. I do have a string like 'CK#123456~fndkjfgdjkg'. Now, i want only the number after #.And not anything after ~. This is number length varies for that field value. It might be 1 or 5 or 3. And i want to add the condition in where class where this number is equal to check_num from other table after joining. I am trying REGEXP_SUBSTR(A.SRC_TXT, '(?<=CK#)(.+?\b)') = C.CHK_NUM in the where condition. I am getting the error as 'No repititive argument after ?'
You can use a regex for this
-- To remove just the character after a ~
select regexp_replace('fo~o bar','~.', '');
-- returns 'fo bar'
--If you want to keep the ~
select regexp_replace('fo~o bar','~.', '~');
-- returns 'fo~ bar'
--If you want to remove everything after the ~
select regexp_replace('fo~o bar','~.*', '');
--returns 'fo'
If you need to remove other specific character sets after a ~, you can probably do this with a slightly more complicated regex, but I'd need examples of your desired input/output to help with that.
EDIT for updated question
This regex replace should get what you need.
select regexp_replace('CK#123456~fndkjfgdjkg','CK#(\\d*)~.*', '\\1');
-- returns 123456
(\\d*) gets ANY number of digits in a row, and the \\1 causes it to replace the match with what was in the first set of parenthesis, which is your list of digits. the CK# and ~.* are there to make sure the whole string gets matched and replaced.
If the CK# can vary as well, you can use .*? like this.
select regexp_replace('ABCD123HI#123456~fndkjfgdjkg','.*?#(\\d*)~.*', '\\1')
-- returns 123456
I'd probably do something like the following, easy enough but not as cool as RegEx type of functions.
set my_string='fooo~12345';
set search_for_me = '~';
SELECT SUBSTR($my_string, 1, DECODE(position($search_for_me, $my_string), 0, length($my_string), position($search_for_me, $my_string)));
I hope this helps...Rich
It looks like lookahead and lookbehinds are not supported in REGEXP functions, they seem to work in the PATTERN clause of a LIST command. Snowflake documentation makes no mention either way of lookahead or lookbehinds.
In your example:
It seems that the query engine is looking for that repeating argument, where you are attempting a lookbehind
You have not specified what you wanted extracted. You have two capture groups, but in this scenario everything would be returned
Since you are looking to remove everything after ~ you have a delimiter, why not use it in your REGEXP_SUBSTR function?
Try the following:
SELECT $1,REGEXP_SUBSTR($1,'\\w+#(.+?)~',1,1,'is',1)
FROM VALUES
('CK#123456~fndkjfgdjkg')
,('QH#128fklj924~fndkjfgdjkg')
;
This looks for:
One or more word characters
Followed by #
Capturing one or more characters upto and not including ~
Returns the characters within the capture group
You can change the .+? to \\d+? to make sure the pattern is only digits. Backslashes must be escaped with a backslash.
The descriptions for each argument of the function can be found here:
https://docs.snowflake.net/manuals/sql-reference/functions/regexp_substr.html
You could check this!!
select substr('CK#123456~fndkjfgdjkg',4,6) from dual;
OUTPUT
123456
https://docs.snowflake.net/manuals/sql-reference/functions/substr.html
I am doing this in Impala or Hive. Basically let say I have a string like this
f-150:aa|f-150:cc|g-210:dd
Each element is separated by the pipe |. Each has prefix f-150 or whatever. I want to be able to remove the prefix and keep only element that matches specific prefix. For example, if the prefix is f-150, I want the final string after regex_replace is
aa|cc
dd is removed because g-210 is different prefix and not match, therefore the whole element is removed.
Any idea how to do this using string expression in one SQL?
Thanks
UPDATE 1
I tried this in Impala:
select regexp_extract('f-150:aa|f-150:cc|g-210:dd','(?:(?:|(\\|))f-150|keep|those):|(?:^|\\|)\\w-\\d{3}:\\w{2}',0);
But got this output:
f-150:aa
In Hive, I got NULL.
The regexyou in question could look like this:
(?:(?:|(\\|))f-150|keep|those):|(?:^|\\|)\\w-\\d{3}:\\w{2}
I have added some pseudo keywords to retain, but I am sure you get the idea:
Wholy match elements that should be dropped but only match the prefix for those that should be retained.
To keep the separator intact, match | at the beginning of an element in group 1 and put it back in the replacement with $1.
Demo
According to the documentation, your query should be written like a Java regex; likewise, this should perform like this code sample in Java.
You could match the values that you want to remove and then replace with an empty string:
f-150:|\|[^:]+:[^|]+$|[^|]+:[^|]+\|
f-150:|\\|[^:]+:[^|]+$|[^|]+:[^|]+\\|
Explanation
f-150: Match literally
| Or
\|[^:]+:[^|]+$ Match a pipe, not a colon one or more times followed by not a pipe one or more times and assert the end of the line
| Or
[^|]+:[^|]+\| Match not a pipe one or more times, a colon followed by matching not a pipe one or more times and then match a pipe
Test with multiple lines and combinations
You may have to loop through the string until the end to get the all the matching sub string. Look ahead syntax is not supported in most sql so above regexp might not be suitable for SQL syntax. For you purpose you can do something like creating a table to loop through just to mimic Oracle's level syntax and join with your table containing the string.
With loop_tab as (
Select 1 loop union all
Select 2 union all
select 3 union all
select 4 union all
select 5),
string_tab as(Select 'f-150:aa|ade|f-150:ce|akg|f-150:bb|'::varchar(40) as str)
Select regexp_substr(str,'(f\\-150\\:\\w+\\|)',1,loop)
from string_tab
join loop_tab on 1=1
Output:
regexp_substr
f-150:aa|
f-150:ce|
f-150:bb|
using SQL 2008; I have the following string:
EMCo: 1 WorkOrder: 12770 WOItem: 10
I am trying to get the WorkOrder #.
When the string did not have the WOItem on end of it, I was able to use the following statement to get WorkOrder #.
[WorkOrder] = LTRIM(RTRIM(RIGHT(HQMA.KeyString,CHARINDEX(':',REVERSE(HQMA.KeyString))-1)))
This statement moves and may have double digits for the Co#, and it does not always have WOItem #. Was hoping to find something that would split after the ":" and just take 2nd group.
Any suggestions?
The patindex suggestion above will work beautifully, now if you still want to use your current statement, substring will pull the same values, and replace will take out WOItem. Not very elegant, it works whether you have WOItem or not:
select substring(LTRIM(RTRIM(RIGHT(REPLACE(HQMA.KeyString,'WOItem:',''),
CHARINDEX(':',REVERSE(REPLACE(HQMA.KeyString,'WOItem:','')))-1))),0,7)
select substring(LTRIM(RTRIM(RIGHT(REPLACE(HQMA.KeyString,'WOItem:',''),
CHARINDEX(':',REVERSE(REPLACE(HQMA.KeyString,'WOItem:','')))-1))),0,7)
How about using patindex()? Assuming the work order always has five characters:
select substring(HQMA.KeyString,
patindex('%WorkOrder: %', HQMA.KeyString) + 11,
5) as WorkOrder
I got this query and want to extract the value between the brackets.
select de_desc, regexp_substr(de_desc, '\[(.+)\]', 1)
from DATABASE
where col_name like '[%]';
It however gives me the value with the brackets such as "[TEST]". I just want "TEST". How do I modify the query to get it?
The third parameter of the REGEXP_SUBSTR function indicates the position in the target string (de_desc in your example) where you want to start searching. Assuming a match is found in the given portion of the string, it doesn't affect what is returned.
In Oracle 11g, there is a sixth parameter to the function, that I think is what you are trying to use, which indicates the capture group that you want returned. An example of proper use would be:
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]', 1,1,NULL,1) from dual;
Where the last parameter 1 indicate the number of the capture group you want returned. Here is a link to the documentation that describes the parameter.
10g does not appear to have this option, but in your case you can achieve the same result with:
select substr( match, 2, length(match)-2 ) from (
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]') match FROM dual
);
since you know that a match will have exactly one excess character at the beginning and end. (Alternatively, you could use RTRIM and LTRIM to remove brackets from both ends of the result.)
You need to do a replace and use a regex pattern that matches the whole string.
select regexp_replace(de_desc, '.*\[(.+)\].*', '\1') from DATABASE;
I have a varchar column with Url's with data that looks like this:
http://google.mews.......http://www.somesite.com
I want to get rid of the first http and keep the second one so the row above would result in:
http://www.somesite.com
I've tried using split() but can't get it to work
Thanks
If you are trying to do this using T-SQL, you can try something in the lines of:
-- assume #v is the variable holding the URL
SELECT SUBSTRING(#v, PATINDEX('%_http://%', #v) + 1, LEN(#v))
This will return the start position of the first http:// that has before it at least one character (hence the '%_' before it and the + 1 offset).
If the first URL always starts right from the beginning of the string, you can use SUBSTRING() & CHARINDEX():
SELECT SUBSTRING(column, CHARINDEX('http://', column, 2), LEN(column))
FROM table
CHARINDEX simply searches a string for a substring and returns the substring's starting position within the string. Its third argument is optional and, if set, specifies the search starting position, in this case it's 2 so it didn't hit the first http://.