I'm extracting a number from a string using the following code:
regexp_substr(data, '\\d+\.\\d+') AS Age
Where the value is 0 (within the string), I'm getting a null value. Is there any way to correct this within the wider query, so all the nulls are replaced with 0s?
I could be helpful if you have shared your sample data, but not sure if below helps:
select
regexp_substr(column1, '\\d+(\.?\\d+)?')
from values
('test0me'),
('234.234'),
('test my age 25.')
;
+------------------------------------------+
| REGEXP_SUBSTR(COLUMN1, '\\D+(\.?\\D+)?') |
|------------------------------------------|
| 0 |
| 234.234 |
| 25 |
+------------------------------------------+
Pretty similar to what #kurt suggested, but without the leading "?:".
Note \d+.\d+ means "one ore more digits followed by any character followed by one or more digits. This pattern requires at least 3 characters to match, and the middle one doesn't even have to be a digit.
0 does not match that pattern. To match 0 or 42 or 1000000, i.e. any integer that is merely a string of digits, you only need \d+
10.5 does not match this pattern, however, so if you need to also capture decimal values, you will need something more complex that handles digits followed by an optional decimal point and more digits:
\d+(?:\.\d+)?
This pattern will match 0, 0.0, 42, 98.6, etc.
Related
I'm getting familiar with postgres sql, but having some trouble with pattern matching. I read the documentation and looked through other questions, but couldn't solve this on my own.
I have a field with lots of text data, in the middle of it, numbers with this pattern:
"2021-1234567" (four digits + - + seven digits)
Problem is, it can have other number sequences. Like this:
"Project number 12345678912345 with id 2020-2583697 1456"
(in this case, i need to extract 2020-2583697)
In some cases it may be just eleven digits, like this:
"Project 12345678912345 sequence 20202583697 1456"
(in this case i need to extract 20202583697)
At first i tried to extract only the numbers (the text is mostly user input)
with:
SELECT
SUBSTRING("my_field", '^[0-9]+$' )
FROM
my_table
That didn't help at all...
Can anyone help me?
This appears to do what you want:
select substring(str, '[0-9]{4}-?[0-9]{7}')
from (values ('asfasdf 2020-2583697 qererf i0iu0 1234234'),
('asfasdf 20202583697 qererf i0iu0 1234234')
) v(str)
It searches for 4 digits followed by an optional hyphen followed by 7 digits.
Or this, as I could not manage to force checking for blanks around the pattern without returning those blanks otherwise:
WITH
indata(s) AS (
SELECT 'Project number 12345678912345 with id 2020-2583697 1456'
UNION ALL SELECT 'Project 12345678912345 sequence 20202583697 1456'
)
SELECT
REGEXP_REPLACE(s,'^.* (\d{4}-?\d{7}) .*$','\1') AS found_token
, s
FROM indata;
found_token | s
--------------+---------------------------------------------------------
2020-2583697 | Project number 12345678912345 with id 2020-2583697 1456
20202583697 | Project 12345678912345 sequence 20202583697 1456
(2 rows)
The pattern used - REGEXP_REPLACE(s,'^.* (\d{4}-?\d{7}) .*$','\1') - means: replace ^.* the beginning of the string and any number of any characters, followed by a blank; then (\d{4}-?\d{7}) four digits, zero or one dash - -?, and seven digits - and the parentheses around it mean: remember this as the first group; finally: .*$ a blank, then any number of any characters till the end of the string - with group 1: \1 .
In Redshift I want to return fields that contain numbers or special characters EXCEPT . (anything other and a-z and A-Z)
The following gets me anything that contains a number but I need to extend this to any special character except full stop (.)
SELECT DISTINCT name
FROM table
WHERE name ~ '[0-9]'
I need something like:
SELECT DISTINCT name
FROM table
WHERE name ~ '[0-9]' OR name ~'[,#';:#~[]{}etcetc'
Sample Data:
name
john
joh1n1
j!ohn!
jo!h2n
joh.n
jo.&hn
j.3ohn
j.$9ohn
Expected Output:
name
joh1n1
j!ohn!
jo!h2n
jo.&hn
j.3ohn
j.$9ohn
You may use
WHERE name !~ '^[[:alpha:].]+$'
Here, all records that do not consist of only alphabetic or dot symbols will be returned. ^ matches the start of a string position, [[:alpha:].]+ matches one or more letters or dots and $ matches the end of string position.
If it is for PostgreSQL you may use
WHERE name SIMILAR TO '%[^[:alpha:].]%'
The SIMILAR TO operator accepts POSIX character classes and bracket expressions and wildcards, too, and requires a full string match. So, % allows any chars before any 1 char other than letter or dot ([^[:alpha:].]), and then there may also be any other chars till the end of the string.
You can do:
SELECT DISTINCT name FROM table WHERE name !~* '[a-z]'
This means: match on names that do not contain any alphanumeric character.
Operator !~* means:
Does not match regular expression, case insensitive
Edit based on the provided sample data and expected results.
If you want to match on names that contain at least one character other than an alphabetic character or a dot, then you can do:
select * from mytable where name ~* '[^a-z.]'
Demo on DB Fiddle:
with mytable(name) as (values
('john'),
('joh1n1'),
('j!ohn!'),
('jo!h2n'),
('joh.n'),
('jo.&hn'),
('j.3ohn'),
('j.$9ohn')
)
select * from mytable where name ~* '[^a-z.]'
| name |
| :------ |
| joh1n1 |
| j!ohn! |
| jo!h2n |
| jo.&hn |
| j.3ohn |
| j.$9ohn |
I have this code:
SELECT REGEXP_REPLACE(name,'^name\[([[:alpha:][:space:][:digit:]]*)\|\|\|([[:alpha:]]*)\|\|\|([[[:alpha:][:space:][:punct:]]*)\|\|\|([[:digit:][:alpha:]]*)\|\|\|([[:digit:][:punct:]]*)\|\|\|([[:alpha:][:space:]]*)\|\|\|([[:alpha:]]*)\|\|\|([[:digit:]]*)\|\|\|([[:alpha:][:space:]]*)\|\|\|([[:alpha:]]*)\|\|\|([[:digit:][:alpha:]]*)\|\|\|([[:digit:][:alpha:][:space:]]*)\|\|\|([[:digit:][:alpha:]]*)\|\|\|([[:alpha:][:space:]]*)\|\|\|([[:alpha:]]*).*','[p1=\10]') as replaced
FROM Dual
Editor's note: the above is a single unreadable line. Here is the same regex with line breaks for readability:
SELECT REGEXP_REPLACE(name
,'^name\[([[:alpha:][:space:][:digit:]]*)\|\|\|
([[:alpha:]]*)\|\|\|
([[[:alpha:][:space:][:punct:]]*)\|\|\|
([[:digit:][:alpha:]]*)\|\|\|
([[:digit:][:punct:]]*)\|\|\|
([[:alpha:][:space:]]*)\|\|\|
([[:alpha:]]*)\|\|\|
([[:digit:]]*)\|\|\|
([[:alpha:][:space:]]*)\|\|\|
([[:alpha:]]*)\|\|\|
([[:digit:][:alpha:]]*)\|\|\|
([[:digit:][:alpha:][:space:]]*)\|\|\|
([[:digit:][:alpha:]]*)\|\|\|
([[:alpha:][:space:]]*)\|\|\|
([[:alpha:]]*).*'
,'[p1=\10]') as replaced
FROM Dual
I want to select tenth position out of it. I am able to select until nine positions but I am not able to make its tenth position on above logic. Any guess or help.
[p1=\9] if I use this expression I am able to select nine positions but I want tenth position string from the above expression.
[p1=\10] if my expression is like this it's selecting first position's value followed by 0.
Any help?
Here's a very basic example of a string that matches your regex:
name[a|||b|||c|||d|||0|||e|||f|||1|||g|||h|||i|||j|||k|||l|||m
So, you want to return 'h', the tenth field, but \10 returns a0.
If you're only interested in the tenth capturing group and none of the previous ones, then you can just remove the brackets on all capturing groups up to that one, and then use \1.
UPDATE: OP wants 2,3,4,8,9,10 and 12th fields, so just add brackets for those fields.
Field | Capture Group number
====================================
2 | \1
3 | \2
4 | \3
8 | \4
9 | \5
10 | \6
12 | \7
The code:
select REGEXP_REPLACE(name
,'^name\[[[:alpha:][:space:][:digit:]]*\|\|\|
([[:alpha:]]*)\|\|\|
([[[:alpha:][:space:][:punct:]]*)\|\|\|
([[:digit:][:alpha:]]*)\|\|\|
[[:digit:][:punct:]]*\|\|\|
[[:alpha:][:space:]]*\|\|\|
[[:alpha:]]*\|\|\|
([[:digit:]]*)\|\|\|
([[:alpha:][:space:]]*)\|\|\|
([[:alpha:]]*)\|\|\|
[[:digit:][:alpha:]]*\|\|\|
([[:digit:][:alpha:][:space:]]*)\|\|\|
[[:digit:][:alpha:]]*\|\|\|
[[:alpha:][:space:]]*\|\|\|
[[:alpha:]]*.*','[p1=\1]') as replaced
FROM Dual
(Linebreaks added to the regex for clarity)
I should add that it looks like the broader question you're asking is how to get the tenth field from a triple-pipe delimited string in Oracle, which may be achievable in other ways that don't involve lengthy regexes like this.
If I have table contents that looks like this :
id | value
------------
1 |CT 6510
2 |IR 52
3 |IRAB
4 |IR AB
5 |IR52
I need to get only those rows with contents starting with "IR" and then a number, (the spaces ignored). It means I should get the values :
2 |IR 52
5 |IR52
because it starts with "IR" and the next non space character is an integer. unlike IRAB, that also starts with "IR" but "A" is the next character. I've only been able to query all starting with IR. But other IR's are also appearing.
select * from public.record where value ilike 'ir%'
How do I do this? Thanks.
You can use the operator ~, which performs a regular expression matching.
e.g:
SELECT * from public.record where value ~ '^IR ?\d';
Add a asterisk to perform a case insensitive matching.
SELECT * from public.record where value ~* '^ir ?\d';
The symbols mean:
^: begin of the string
?: the character before (here a white space) is optional
\d: all digits, equivalent to [0-9]
See for more info: Regular Expression Match Operators
See also this question, very informative: difference-between-like-and-in-postgres
Looking for a simple SQL (PostgreSQL) regular expression or similar solution (maybe soundex) that will allow a flexible search. So that dashes, spaces and such are omitted during the search. As part of the search and only the raw characters are searched in the table.:
Currently using:
SELECT * FROM Productions WHERE part_no ~* '%search_term%'
If user types UTR-1 it fails to bring up UTR1 or UTR 1 stored in the database.
But the matches do not happen when a part_no has a dash and the user omits this character (or vice versa)
EXAMPLE search for part UTR-1 should find all matches below.
UTR1
UTR --1
UTR 1
any suggestions...
You may well find the offical, built-in (from 8.3 at least) fulltext search capabilities in postrgesql worth looking at:
http://www.postgresql.org/docs/8.3/static/textsearch.html
For example:
It is possible for the parser to produce overlapping tokens from the
same of text.
As an example, a hyphenated word will be reported both as the entire word
and as each component:
SELECT alias, description, token FROM ts_debug('foo-bar-beta1');
alias | description | token
-----------------+------------------------------------------+---------------
numhword | Hyphenated word, letters and digits | foo-bar-beta1
hword_asciipart | Hyphenated word part, all ASCII | foo
blank | Space symbols | -
hword_asciipart | Hyphenated word part, all ASCII | bar
blank | Space symbols | -
hword_numpart | Hyphenated word part, letters and digits | beta1
SELECT *
FROM Productions
WHERE REGEXP_REPLACE(part_no, '[^[:alnum:]]', '')
= REGEXP_REPLACE('UTR-1', '[^[:alnum:]]', '')
Create an index on REGEXP_REPLACE(part_no, '[^[:alnum:]]', '') for this to work fast.