SQL Regex to select string between second and third forward slash - sql

I am using Postgres/Redshift to query a table of URLs and am trying to use
SELECT regex_substr to select a string that is between the second and third forward slash in the column.
For example I need the second slash delimited string in the following data:
/abc/required_string/5856365/
/abc/required_string/2/
/abc/required_string/l-en/
/abc/required_string/l-en/
Following some of the regexs in this this thread:
SELECT regexp_substr(column, '/[^/]*/([^/]*)/')
FROM table
None seem to work. I keep getting:
/abc/required_string/
/abc/required_string/

What about split_part?
SELECT split_part(column, '/', 3) FROM table
Example:
select split_part ('/abc/required_string/2/', '/', 3)
Returns: required string

This may work :
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE t
("c" varchar(29))
;
INSERT INTO t
("c")
VALUES
('/abc/required_string/5856365/'),
('/abc/required_string/2/'),
('/abc/required_string/l-en/'),
('/abc/required_string/l-en/')
;
Query 1:
SELECT substring("c" from '/[^/]*/([^/]*)/')
FROM t
Results:
| substring |
|-----------------|
| required_string |
| required_string |
| required_string |
| required_string |

Related

SQL - trimming values before bracket

I have a column of values where some values contain brackets with text which I would like to remove. This is an example of what I have and what I want:
CREATE TABLE test
(column_i_have varchar(50),
column_i_want varchar(50))
INSERT INTO test (column_i_have, column_i_want)
VALUES ('hospital (PWD)', 'hopistal'),
('nursing (LLC)','nursing'),
('longterm (AT)', 'longterm'),
('inpatient', 'inpatient')
I have only come across approaches that use the number of characters or the position to trim the string, but these values have varying lengths. One way I was thinking was something like:
TRIM('(*',col1)
Doesn't work. Is there a way to do this in postgres SQL without using the position? THANK YOU!
If all the values contain "valid" brackets, then you may use split_part function without any regular expressions:
select
test.*,
trim(split_part(column_i_have, '(', 1)) as res
from test
column_i_have | column_i_want | res
:------------- | :------------ | :--------
hospital (PWD) | hopistal | hospital
nursing (LLC) | nursing | nursing
longterm (AT) | longterm | longterm
inpatient | inpatient | inpatient
db<>fiddle here
You can replace partial patterns using regular expressions. For example:
select *, regexp_replace(v, '\([^\)]*\)', '', 'g') as r
from (
select '''hospital (PWD)'', ''nursing (LLC)'', ''longterm (AT)'', ''inpatient''' as v
) x
Result:
r
-------------------------------------------------
'hospital ', 'nursing ', 'longterm ', 'inpatient'
See example at db<>fiddle.
Could it be as easy as:
SELECT SUBSTRING(column_i_have, '\w+') AS column_i_want FROM test
See demo
If not, and you still want to use SUBSTRING() to get upto but exclude paranthesis, then maybe:
SELECT SUBSTRING(column_i_have, '^(.+?)(?:\s*\(.*)?$') AS column_i_want FROM test
See demo
But if you really are looking upto the opening paranthesis, then maybe just use SPLIT_PART():
SELECT SPLIT_PART(column_i_have, ' (', 1) AS column_i_want FROM test
See demo

How to use the SQL REPLACE Function, so that it will replace some text between a certain range, rather than one specific value

I have a table called Product and I am trying to replace some of the values in the Product ID column pictured below:
ProductID
PIDLL0000074853
PIDLL000086752
PIDLL00000084276
I am familiar with the REPLACE function and have used this like so:
SELECT REPLACE(ProductID, 'LL00000', '/') AS 'Product Code'
FROM Product
Which returns:
Product Code
PID/74853
PIDLL000086752
PID/084276
There will always be there letter L in the ProductID twice LL. However, the zeros range between 4-6. The L and 0 should be replaced with a /.
If anyone could suggest the best way to achieve this, it would be greatly appreciate. I'm using Microsoft SQL Server, so standard SQL syntax would be ideal.
Please try the following solution.
All credit goes to #JeroenMostert
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, ProductID VARCHAR(50));
INSERT INTO #tbl (ProductID) VALUES
('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PITLL0000084770');
-- DDL and sample data population, end
SELECT *
, CONCAT(LEFT(ProductID,3),'/', CONVERT(DECIMAL(38, 0), STUFF(ProductID, 1, 5, ''))) AS [After]
FROM #tbl;
Output
+----+------------------+-----------+
| ID | ProductID | After |
+----+------------------+-----------+
| 1 | PIDLL0000074853 | PID/74853 |
| 2 | PIDLL000086752 | PID/86752 |
| 3 | PIDLL00000084276 | PID/84276 |
| 4 | PITLL0000084770 | PIT/84770 |
+----+------------------+-----------+
This isn't particularly pretty in T-SQL, as it doesn't support regex or even pattern replacement. Therefore you method is to use things like CHARINDEX and PATINDEX to find the start and end positions and then replace (don't read REPLACE) that part of the text.
This uses CHARINDEX to find the 'LL', and then PATINDEX to find the first non '0' character after that position. As PATINDEX doesn't support a start position I have to use STUFF to remove the first characters.
Then, finally, we can use STUFF (again) to replace the length of characters with a single '/':
SELECT STUFF(V.ProductID,CI.I+2,ISNULL(PI.I,0),'/')
FROM (VALUES('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PIDLL3246954384276'))V(ProductID)
CROSS APPLY(VALUES(NULLIF(CHARINDEX('LL',V.ProductID),0)))CI(I)
CROSS APPLY(VALUES(NULLIF(PATINDEX('%[^0]%',STUFF(V.ProductID,1,CI.I+2,'')),1)))PI(I);
If you are always starting with "PIDLL", you can just remove the "PIDLL", cast the rest as an INT to lose the leading 0's, then append the front of the string with "PID/". One line of code.
-- Sample Data
DECLARE #t TABLE (ProductID VARCHAR(40));
INSERT #t VALUES('PIDLL0000074853'),('PIDLL000086752'),('PIDLL00000084276');
-- Solution
SELECT t.ProductID, NewProdID = 'PID/'+LEFT(CAST(REPLACE(t.ProductID,'PIDLL','') AS INT),20)
FROM #t AS t;
Returns:
ProductID NewProdID
------------------ ----------------
PIDLL0000074853 PID/74853
PIDLL000086752 PID/86752
PIDLL00000084276 PID/84276

Merging tags to values separated by new line character in Oracle SQL

I have a database field with several values separated by newline.
Eg-(can be more than 3 also)
A
B
C
I want to perform an operation to modify these values by adding tags from front and end.
i.e the previous 3 values should need to be turned into
<Test>A</Test>
<Test>B</Test>
<Test>C</Test>
Is there any possible query operation in Oracle SQL to perform such an operation?
Just replace the start and end of each string with the XML tags using a multi-line match parameter of the regular expression:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE( value, '^', '<Test>', 1, 0, 'm' ),
'$', '</Test>', 1, 0, 'm'
) AS replaced_value
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'A
B
C' FROM DUAL;
Outputs:
| REPLACED_VALUE |
| :------------- |
| <Test>A</Test> |
| <Test>B</Test> |
| <Test>C</Test> |
db<>fiddle here
You can use normal replace function as follows:
Select '<test>'
|| replace(your_column,chr(10),'</test>'||chr(10)||'<test>')
|| '</test>'
From your_table;
It will be faster than its regexp_replace function.
Db<>fiddle

SQL: Select rows that contain a word

The goal is to select all rows that contain some specific word, can be in the beginning or the end of the string and/or surrounded by white-space, should not be inside other word, so to speak.
Here are couple rows in my database:
+---+--------------------+
| 1 | string with test |
+---+--------------------+
| 2 | test string |
+---+--------------------+
| 3 | testing stringtest |
+---+--------------------+
| 4 | not-a-test |
+---+--------------------+
| 5 | test |
+---+--------------------+
So in this example, selecting word test, should return rows 1, 2 and 5.
Problem is that for some reason, SELECT * FROM ... WHERE ... RLIKE '(\s|^)test(\s|$)'; returns 0 rows.
Where am I wrong and maybe, how it could be done better?
Edit: Query should also select the row with just a word test.
The answer to my first question is:
I haven't escaped special characters, so \s should be \\s.
Working query: SELECT * FROM ... WHERE ... RLIKE '(\\s|^)test(\\s|$)';. (or just a space ( |^)/( |$), also works)
Hi you could grab with trailing space and with leading space
SELECT * from new_table
where text RLIKE(' test')
union
SELECT * from new_table
where text RLIKE('test ')
REGEXP_INSTR() function, which's is an extension of the INSTR() function, might be used for version 10.0.5+ case-insensitively as default :
SELECT *
FROM t
WHERE REGEXP_INSTR(str, 'TeSt ')>0
OR REGEXP_INSTR(str, ' tESt')>0
Demo
SELECT * FROM ...
WHERE ... LIKE 'test';
This should do the trick.
Is this what you want?
SELECT * FROM ... WHERE ... LIKE
'%test%';
Use word boundary tests:
Before MySQL 8.0, and in MariaDB:
WHERE ... REGEXP '[[:<:]]test[[:>:]]'
MySQL 8.0:
WHERE ... REGEXP '\btest\b'
(If that does not work, double up the backslashes; this depends on whether the client is collapsing backslashes before MySQL gets them.)
Note that this solution will also work with punctuation such as the comma in "foo, test, bar"

regex in postgres 8.4

I am trying to write this query:
SELECT DISTINCT createdcfgid FROM ab WHERE (createdcfgid ~ ‘^[0-9]+$’)
This results in
syntax error at or near "[" LINE 3: WHERE (createdcfgid ~ ‘^[0-9]+$’)
Anyone there who can give me a clue about what I am doing wrong?
Thanks in advance
It does just look like you are using the wrong quotes, try ' rather than ‘:
PostgreSQL 8.4 Schema Setup:
create table ab(createdcfgid text);
insert into ab(createdcfgid) values ('111');
Query:
SELECT DISTINCT createdcfgid FROM ab WHERE (createdcfgid ~ '^[0-9]+$')
Results:
| CREATEDCFGID |
----------------
| 111 |
this on SQL Fiddle