Postgresql: Extracting substring after first instance of delimiter - sql

I'm trying to extract everything after the first instance of a delimiter.
For example:
01443-30413 -> 30413
1221-935-5801 -> 935-5801
I have tried the following queries:
select regexp_replace(car_id, E'-.*', '') from schema.table_name;
select reverse(split_part(reverse(car_id), '-', 1)) from schema.table_name;
However both of them return:
01443-30413 -> 30413
1221-935-5801 -> 5801
So it's not working if delimiter appears multiple times.
I'm using Postgresql 11. I come from a MySQL background where you can do:
select SUBSTRING(car_id FROM (LOCATE('-',car_id)+1)) from table_name

Why not just do the PG equivalent of your MySQL approach and substring it?
SELECT SUBSTRING('abcdef-ghi' FROM POSITION('-' in 'abcdef-ghi') + 1)
If you don't like the "from" and "in" way of writing arguments, PG also has "normal" comma separated functions:
SELECT SUBSTR('abcdef-ghi', STRPOS('abcdef-ghi', '-') + 1)

I think that regexp_replace is appropriate, but using the correct pattern:
select regexp_replace('1221-935-5801', E'^[^-]+-', '');
935-5801
The regex pattern ^[^-]+- matches, from the start of the string, one or more non dash characters, ending with a dash. It then replaces with empty string, effectively removing this content.
Note that this approach also works if the input has no dashes at all, in which case it would just return the original input.

Use this regexp pattern :
select regexp_replace('1221-935-5801', E'^[^-]+-', '') from schema.table_name
Regexp explanation :
^ is the beginning of the string
[^-]+ means at least one character different than -
...until the - character is met

I tried it in a conventional way in general what we do (found
something similar to instr as strpos in postgrsql .) Can try the below
SELECT
SUBSTR(car_id,strpos(car_id,'-')+1,
length(car_id) ) from table ;

Related

Get rows which contain exactly one special character

I have a SQL query which returns some rows having the below format:
DB_host
DB_host_instance
How can i filter to get rows which only have the format of 'DB_host' (place a condition to return values with only one occurrence of '_')
i tried using [0-9a-zA-Z_0-9a-zA-Z], but seems like its not right. Please suggest.
One option would be using REGEXP_COUNT and at most one underscore is needed then use
WHERE REGEXP_COUNT( col, '_' ) <= 1
or strictly one underscore should exist then use
WHERE REGEXP_COUNT( col, '_' ) = 1
A simple method is a regular expression:
where regexp_like(col, '^[^_]+_[^_]+$')
This matches the full string when there is a string with no underscores followed by an underscore followed by another string with no underscores.
You could also do this with LIKE, but it is more complicated:
where col like '%\_%' and col not like '%\_%\_%'
That is, has one underscore but not two. The \ is needed because _ is a wildcard for LIKE patterns.
You can suppress underscores in the string, and ensure that the length of the result is just one character less than the original:
where len(replace(col, '_', '')) = len(col) - 1
I wonder how this method would compare to a regex or two likes in terms of efficiency on a large dataset. I would not be surprised it it was more efficient.

How can I replace a string pattern with blank in hive?

I have a string as:
https://maps.googleapis.com/maps/api/staticmap?center=41.892532+-87.63811&zoom=11&scale=2&size=280x320&maptype=roadmap&format=png&visual_refresh=true%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:1%7C2413+S+State+St++Chicago+IL+60616%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:2%7C3000+N+Halsted+St++Chicago+IL+60657%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&key=AIzaSyBNEAQcC5niAEeiP3zkA_nuWGvtl0IOEs4
I want to replace the '++++' pattern at the end with blank and not the single occurrence of '+'. Tried using regexp_replace and translate functions in hive but that replaces all the single occurrences of '+' as well.
Use
regexp_replace(string,'[+]{4}','')
Pattern '[+]{4}' means + caracter four times.
Test:
select regexp_replace('++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&','[+]{4}','');
Result:
OK
++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C&
Dod you try this?
replace(string, '++++', '')
Admittedly, this will replace all occurrences of '++++', but your string only has one of them.

SQL Substring \g

I would just like to know where do I put the \g in this query?
SELECT project,
SUBSTRING(address FROM 'A-Za-z') AS letters,
SUBSTRING(address FROM '\d') AS numbers
FROM repositories
I tried this but this brings back nothing (it doesn't throw an error though)
SELECT project,
SUBSTRING(CONCAT(address, '#') FROM 'A-Za-z' FOR '#') AS letters,
SUBSTRING(CONCAT(address, '#') FROM '\d' FOR '#') AS numbers
FROM repositories
Here is an example: I would like the string 1DDsg6bXmh3W63FTVN4BLwuQ4HwiUk5hX to return DDsgbXmhWFTVNBLwuQHwiUkhX. So basically return all the letters...and then my second one is to return all the numbers.
The g (“global”) modifier in regular expressions indicates that all matches rather than only the first one should be used.
That doesn't make much sense in the substring function, which returns only a single value, namely the first match. So there is no way to use g with substring.
In those functions where it makes sense in PostgreSQL (regexp_replace and regexp_matches), the g can be specified in the optional last flags parameter.
If you want to find all substrings that match a pattern, use regexp_matches.
For your example, which really has nothing to do with substring at all, I'd use
SELECT translate('1DDsg6bXmh3W63FTVN4BLwuQ4HwiUk5hX', '0123456789', '');
translate
---------------------------
DDsgbXmhWFTVNBLwuQHwiUkhX
(1 row)
So this is not pure SQL but Postgresql, but this also does the job:
SELECT project,
regexp_replace(address, '[^A-Za-z]', '', 'g') AS letters,
regexp_replace(address, '[^0-9]', '', 'g') AS numbers
FROM repositories;

replace all occurrences of a sub string between 2 charcters using sql

Input string: ["1189-13627273","89-13706681","118-13708388"]
Expected Output: ["14013627273","14013706681","14013708388"]
What I am trying to achieve is to replace any numbers till the '-' for each item with hard coded text like '140'
SELECT replace(value_to_replace, '-', '140')
FROM (
VALUES ('1189-13627273-77'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
check this
I found the right way to achieve that using the below regular expression.
SELECT REGEXP_REPLACE (string_to_change, '\\"[0-9]+\\-', '140')
You don't need a regexp for this, it's as easy as concatenation of 140 and the substring from - (or the second part when you split by -)
select '140'||substring('89-13706681' from position('-' in '89-13706681')+1 for 1000)
select '140'||split_part('89-13706681','-',2)
also, it's important to consider if you might have instances that don't contain - and what would be the output in this case
Use regexp_replace(text,text,text) function to do so giving the pattern to match and replacement string.
First argument is the value to be replaced, second is the POSIX regular expression and third is a replacement text.
Example
SELECT regexp_replace('1189-13627273', '.*-', '140');
Output: 14013627273
Sample data set query
SELECT regexp_replace(value_to_replace, '.*-', '140')
FROM (
VALUES ('1189-13627273'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
Caution! Pattern .*- will replace every character until it finds last occurence of - with text 140.

Cut string after first occurrence of a character

I have strings like 'keepme:cutme' or 'string-without-separator' which should become respectively 'keepme' and 'string-without-separator'. Can this be done in PostgreSQL? I tried:
select substring('first:last' from '.+:')
But this leaves the : in and won't work if there is no : in the string.
Use split_part():
SELECT split_part('first:last', ':', 1) AS first_part
Returns the whole string if the delimiter is not there. And it's simple to get the 2nd or 3rd part etc.
Substantially faster than functions using regular expression matching. And since we have a fixed delimiter we don't need the magic of regular expressions.
Related:
Split comma separated column data into additional columns
regexp_replace() may be overload for what you need, but it also gives the additional benefit of regex. For instance, if strings use multiple delimiters.
Example use:
select regexp_replace( 'first:last', E':.*', '');
SQL Select to pick everything after the last occurrence of a character
select right('first:last', charindex(':', reverse('first:last')) - 1)