SQL: Select rows that contain a word - sql

The goal is to select all rows that contain some specific word, can be in the beginning or the end of the string and/or surrounded by white-space, should not be inside other word, so to speak.
Here are couple rows in my database:
+---+--------------------+
| 1 | string with test |
+---+--------------------+
| 2 | test string |
+---+--------------------+
| 3 | testing stringtest |
+---+--------------------+
| 4 | not-a-test |
+---+--------------------+
| 5 | test |
+---+--------------------+
So in this example, selecting word test, should return rows 1, 2 and 5.
Problem is that for some reason, SELECT * FROM ... WHERE ... RLIKE '(\s|^)test(\s|$)'; returns 0 rows.
Where am I wrong and maybe, how it could be done better?
Edit: Query should also select the row with just a word test.
The answer to my first question is:
I haven't escaped special characters, so \s should be \\s.
Working query: SELECT * FROM ... WHERE ... RLIKE '(\\s|^)test(\\s|$)';. (or just a space ( |^)/( |$), also works)

Hi you could grab with trailing space and with leading space
SELECT * from new_table
where text RLIKE(' test')
union
SELECT * from new_table
where text RLIKE('test ')

REGEXP_INSTR() function, which's is an extension of the INSTR() function, might be used for version 10.0.5+ case-insensitively as default :
SELECT *
FROM t
WHERE REGEXP_INSTR(str, 'TeSt ')>0
OR REGEXP_INSTR(str, ' tESt')>0
Demo

SELECT * FROM ...
WHERE ... LIKE 'test';
This should do the trick.

Is this what you want?
SELECT * FROM ... WHERE ... LIKE
'%test%';

Use word boundary tests:
Before MySQL 8.0, and in MariaDB:
WHERE ... REGEXP '[[:<:]]test[[:>:]]'
MySQL 8.0:
WHERE ... REGEXP '\btest\b'
(If that does not work, double up the backslashes; this depends on whether the client is collapsing backslashes before MySQL gets them.)
Note that this solution will also work with punctuation such as the comma in "foo, test, bar"

Related

SQL - trimming values before bracket

I have a column of values where some values contain brackets with text which I would like to remove. This is an example of what I have and what I want:
CREATE TABLE test
(column_i_have varchar(50),
column_i_want varchar(50))
INSERT INTO test (column_i_have, column_i_want)
VALUES ('hospital (PWD)', 'hopistal'),
('nursing (LLC)','nursing'),
('longterm (AT)', 'longterm'),
('inpatient', 'inpatient')
I have only come across approaches that use the number of characters or the position to trim the string, but these values have varying lengths. One way I was thinking was something like:
TRIM('(*',col1)
Doesn't work. Is there a way to do this in postgres SQL without using the position? THANK YOU!
If all the values contain "valid" brackets, then you may use split_part function without any regular expressions:
select
test.*,
trim(split_part(column_i_have, '(', 1)) as res
from test
column_i_have | column_i_want | res
:------------- | :------------ | :--------
hospital (PWD) | hopistal | hospital
nursing (LLC) | nursing | nursing
longterm (AT) | longterm | longterm
inpatient | inpatient | inpatient
db<>fiddle here
You can replace partial patterns using regular expressions. For example:
select *, regexp_replace(v, '\([^\)]*\)', '', 'g') as r
from (
select '''hospital (PWD)'', ''nursing (LLC)'', ''longterm (AT)'', ''inpatient''' as v
) x
Result:
r
-------------------------------------------------
'hospital ', 'nursing ', 'longterm ', 'inpatient'
See example at db<>fiddle.
Could it be as easy as:
SELECT SUBSTRING(column_i_have, '\w+') AS column_i_want FROM test
See demo
If not, and you still want to use SUBSTRING() to get upto but exclude paranthesis, then maybe:
SELECT SUBSTRING(column_i_have, '^(.+?)(?:\s*\(.*)?$') AS column_i_want FROM test
See demo
But if you really are looking upto the opening paranthesis, then maybe just use SPLIT_PART():
SELECT SPLIT_PART(column_i_have, ' (', 1) AS column_i_want FROM test
See demo

How to replace rows containing alphabets and special characters with Blank spaces in snowflake

I have a column "A" which contains numbers for example- 0001, 0002, 0003
the same column "A" also contains some alphabets and special characters in some of the rows for example - connn, cco*jjj, hhhhhh11111 etc.
I want to replace these alphabets and special characters rows with blank values and only want to keep the rows containing the number.
which regex expression I can use here?
If you want to extract numbers from these values (even if they end or start with non digits), you may use something like this:
create table testing ( A varchar ) as select *
from values ('123'),('aaa123'),('3343'),('aaaa');
select REGEXP_SUBSTR( A, '\\D*(\\d+)\\D*', 1, 1, 'e', 1 ) res
from testing;
+------+
| RES |
+------+
| 123 |
| 123 |
| 3343 |
| NULL |
+------+
I understand that you want to set to null all values that do not contain digits only.
If so, you can use try_to_decimal():
update mytable
set a = null
where a try_to_decimal(a) is null
Or a regexp match:
update mytable
set a = null
where a rlike '[^0-9]'

PostgreSQL - Extract string before ending delimiter

I have a column of data that looks like this:
58,0:102,56.00
52,0:58,68
58,110
57,440.00
52,0:58,0:106,6105.95
I need to extract the character before the last delimiter (',').
Using the data above, I want to get:
102
58
58
57
106
Might be done with a regular expression in substring(). If you want:
the longest string of only digits before the last comma:
substring(data, '(\d+)\,[^,]*$')
Or you may want:
the string before the last comma (',') that's delimited at the start either by a colon (':') or the start of the string.
Could be another regexp:
substring(data, '([^:]*)\,[^,]*$')
Or this:
reverse(split_part(split_part(reverse(data), ',', 2), ':', 1))
More verbose but typically much faster than a (expensive) regular expression.
db<>fiddle here
Can't promise this is the best way to do it, but it is a way to do it:
with splits as (
select string_to_array(bar, ',') as bar_array
from foo
),
second_to_last as (
select
bar_array[cardinality(bar_array)-1] as field
from splits
)
select
field,
case
when field like '%:%' then split_part (field, ':', 2)
else field
end as last_item
from second_to_last
I went a little overkill on the CTEs, but that was to expose the logic a little better.
With a CTE that removes everything after the last comma and then splits the rest into an array:
with cte as (
select
regexp_split_to_array(
replace(left(col, length(col) - position(',' in reverse(col))), ':', ','),
','
) arr
from tablename
)
select arr[array_upper(arr, 1)] from cte
See the demo.
Results:
| result |
| ------ |
| 102 |
| 58 |
| 58 |
| 57 |
| 106 |
The following treats the source string as an "array of arrays". It seems each data element can be defined as S(x,y) and the overall string as S1:S2:...Sn.
The task then becomes to extract x from Sn.
with as_array as
( select string_to_array(S[n], ',') Sn
from (select string_to_array(col,':') S
, length(regexp_replace(col, '[^:]','','g'))+1 n
from tablename
) t
)
select Sn[array_length(Sn,1)-1] from as_array
The above extends S(x,y) to S(a,b,...,x,y) the task remains to extracting x from Sn. If it is the case that all original sub-strings S are formatted S(x,y) then the last select reduces to select Sn[1]

SQL Regex to select string between second and third forward slash

I am using Postgres/Redshift to query a table of URLs and am trying to use
SELECT regex_substr to select a string that is between the second and third forward slash in the column.
For example I need the second slash delimited string in the following data:
/abc/required_string/5856365/
/abc/required_string/2/
/abc/required_string/l-en/
/abc/required_string/l-en/
Following some of the regexs in this this thread:
SELECT regexp_substr(column, '/[^/]*/([^/]*)/')
FROM table
None seem to work. I keep getting:
/abc/required_string/
/abc/required_string/
What about split_part?
SELECT split_part(column, '/', 3) FROM table
Example:
select split_part ('/abc/required_string/2/', '/', 3)
Returns: required string
This may work :
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE t
("c" varchar(29))
;
INSERT INTO t
("c")
VALUES
('/abc/required_string/5856365/'),
('/abc/required_string/2/'),
('/abc/required_string/l-en/'),
('/abc/required_string/l-en/')
;
Query 1:
SELECT substring("c" from '/[^/]*/([^/]*)/')
FROM t
Results:
| substring |
|-----------------|
| required_string |
| required_string |
| required_string |
| required_string |

querying WHERE condition to character length?

I have a database with a large number of words but i want to select only those records where the character length is equal to a given number (in example case 3):
$query = ("SELECT * FROM $db WHERE conditions AND length = 3");
But this does not work... can someone show me the correct query?
Sorry, I wasn't sure which SQL platform you're talking about:
In MySQL:
$query = ("SELECT * FROM $db WHERE conditions AND LENGTH(col_name) = 3");
in MSSQL
$query = ("SELECT * FROM $db WHERE conditions AND LEN(col_name) = 3");
The LENGTH() (MySQL) or LEN() (MSSQL) function will return the length of a string in a column that you can use as a condition in your WHERE clause.
Edit
I know this is really old but thought I'd expand my answer because, as Paulo Bueno rightly pointed out, you're most likely wanting the number of characters as opposed to the number of bytes. Thanks Paulo.
So, for MySQL there's the CHAR_LENGTH(). The following example highlights the difference between LENGTH() an CHAR_LENGTH():
CREATE TABLE words (
word VARCHAR(100)
) ENGINE INNODB DEFAULT CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci;
INSERT INTO words(word) VALUES('快樂'), ('happy'), ('hayır');
SELECT word, LENGTH(word) as num_bytes, CHAR_LENGTH(word) AS num_characters FROM words;
+--------+-----------+----------------+
| word | num_bytes | num_characters |
+--------+-----------+----------------+
| 快樂 | 6 | 2 |
| happy | 5 | 5 |
| hayır | 6 | 5 |
+--------+-----------+----------------+
Be careful if you're dealing with multi-byte characters.
I think you want this:
select *
from dbo.table
where DATALENGTH(column_name) = 3
SELECT *
FROM my_table
WHERE substr(my_field,1,5) = "abcde";