How can I replace a string pattern with blank in hive? - sql

I have a string as:
https://maps.googleapis.com/maps/api/staticmap?center=41.892532+-87.63811&zoom=11&scale=2&size=280x320&maptype=roadmap&format=png&visual_refresh=true%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:1%7C2413+S+State+St++Chicago+IL+60616%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:2%7C3000+N+Halsted+St++Chicago+IL+60657%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&key=AIzaSyBNEAQcC5niAEeiP3zkA_nuWGvtl0IOEs4
I want to replace the '++++' pattern at the end with blank and not the single occurrence of '+'. Tried using regexp_replace and translate functions in hive but that replaces all the single occurrences of '+' as well.

Use
regexp_replace(string,'[+]{4}','')
Pattern '[+]{4}' means + caracter four times.
Test:
select regexp_replace('++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&','[+]{4}','');
Result:
OK
++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C&

Dod you try this?
replace(string, '++++', '')
Admittedly, this will replace all occurrences of '++++', but your string only has one of them.

Related

Snowflake SQL Regex

I am trying to identify a value that is nested in a string using Snowflakes regexp_substr()
The value that I want to access is in quotes:
...
Type:
value: "CategoryA"
...
Edit: This text is nested in a much larger portion of text.
I want to extract CategoryA for all columns using regexp_substr. But I am unsure how.
I have tried:
regexp_substr(col, 'Type\\W+(\\w+)\\W+\\w.+')
and while that gives the portion of the string, I just want what is in quotes and can't figure out how to do so.
You could use regexp_replace() instead:
regexp_replace(col, '(^[^"]*")|("[^"]*$)", '')
The regexp matches on both following conditions, and replaces matching parts with the empty string:
^[^"]*": everything from the beginning of the string to the first double quote
("[^"]*$)": everything from the last double quote to the end of the string

Postgresql: Extracting substring after first instance of delimiter

I'm trying to extract everything after the first instance of a delimiter.
For example:
01443-30413 -> 30413
1221-935-5801 -> 935-5801
I have tried the following queries:
select regexp_replace(car_id, E'-.*', '') from schema.table_name;
select reverse(split_part(reverse(car_id), '-', 1)) from schema.table_name;
However both of them return:
01443-30413 -> 30413
1221-935-5801 -> 5801
So it's not working if delimiter appears multiple times.
I'm using Postgresql 11. I come from a MySQL background where you can do:
select SUBSTRING(car_id FROM (LOCATE('-',car_id)+1)) from table_name
Why not just do the PG equivalent of your MySQL approach and substring it?
SELECT SUBSTRING('abcdef-ghi' FROM POSITION('-' in 'abcdef-ghi') + 1)
If you don't like the "from" and "in" way of writing arguments, PG also has "normal" comma separated functions:
SELECT SUBSTR('abcdef-ghi', STRPOS('abcdef-ghi', '-') + 1)
I think that regexp_replace is appropriate, but using the correct pattern:
select regexp_replace('1221-935-5801', E'^[^-]+-', '');
935-5801
The regex pattern ^[^-]+- matches, from the start of the string, one or more non dash characters, ending with a dash. It then replaces with empty string, effectively removing this content.
Note that this approach also works if the input has no dashes at all, in which case it would just return the original input.
Use this regexp pattern :
select regexp_replace('1221-935-5801', E'^[^-]+-', '') from schema.table_name
Regexp explanation :
^ is the beginning of the string
[^-]+ means at least one character different than -
...until the - character is met
I tried it in a conventional way in general what we do (found
something similar to instr as strpos in postgrsql .) Can try the below
SELECT
SUBSTR(car_id,strpos(car_id,'-')+1,
length(car_id) ) from table ;

replace all occurrences of a sub string between 2 charcters using sql

Input string: ["1189-13627273","89-13706681","118-13708388"]
Expected Output: ["14013627273","14013706681","14013708388"]
What I am trying to achieve is to replace any numbers till the '-' for each item with hard coded text like '140'
SELECT replace(value_to_replace, '-', '140')
FROM (
VALUES ('1189-13627273-77'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
check this
I found the right way to achieve that using the below regular expression.
SELECT REGEXP_REPLACE (string_to_change, '\\"[0-9]+\\-', '140')
You don't need a regexp for this, it's as easy as concatenation of 140 and the substring from - (or the second part when you split by -)
select '140'||substring('89-13706681' from position('-' in '89-13706681')+1 for 1000)
select '140'||split_part('89-13706681','-',2)
also, it's important to consider if you might have instances that don't contain - and what would be the output in this case
Use regexp_replace(text,text,text) function to do so giving the pattern to match and replacement string.
First argument is the value to be replaced, second is the POSIX regular expression and third is a replacement text.
Example
SELECT regexp_replace('1189-13627273', '.*-', '140');
Output: 14013627273
Sample data set query
SELECT regexp_replace(value_to_replace, '.*-', '140')
FROM (
VALUES ('1189-13627273'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
Caution! Pattern .*- will replace every character until it finds last occurence of - with text 140.

Replace multiple repeating character to one

I have a varchar column, and each field contains a single word, but there are random number of pipe character before and after the word.
Something like this:
MyVarcharColumn
'|||Apple|||||'
'|||||Pear|||||'
'||Leaf|'
When I query the table, I wish to replace the multiple pipes to a single one, so the result would be like this:
MyVarcharColumn
'|Apple|'
'|Pear|'
'|Leaf|'
Cannot figure out how to solve it with REPLACE function, anybody knows?
vkp's method absolutely solves your issue. Another method that works, and also will work in a variety of other situations, is using a triple REPLACE()
SELECT REPLACE(REPLACE(REPLACE('|||Apple|||||', '|', '><'), '<>',''), '><','|')
This method will allow you to keep a delimiter between multiple strings where Mr. VPK's method will concat the strings and put a delim at the very beginning and the very end.
SELECT REPLACE(REPLACE(REPLACE('|||Apple|||||Banana||||||||||', '|', '><'), '<>',''), '><','|')
One way is to replace all the | with blanks and add a pipe character at the beginning and the end of string.
select '|'+replace(mycolumn,'|','')+'|' from tablename

Cut string after first occurrence of a character

I have strings like 'keepme:cutme' or 'string-without-separator' which should become respectively 'keepme' and 'string-without-separator'. Can this be done in PostgreSQL? I tried:
select substring('first:last' from '.+:')
But this leaves the : in and won't work if there is no : in the string.
Use split_part():
SELECT split_part('first:last', ':', 1) AS first_part
Returns the whole string if the delimiter is not there. And it's simple to get the 2nd or 3rd part etc.
Substantially faster than functions using regular expression matching. And since we have a fixed delimiter we don't need the magic of regular expressions.
Related:
Split comma separated column data into additional columns
regexp_replace() may be overload for what you need, but it also gives the additional benefit of regex. For instance, if strings use multiple delimiters.
Example use:
select regexp_replace( 'first:last', E':.*', '');
SQL Select to pick everything after the last occurrence of a character
select right('first:last', charindex(':', reverse('first:last')) - 1)