REGEXP_EXTRACT in hive or Impala to extract a Substring - sql

Hi i am new to hive i am using regexp_extract for getting substring from a string
my fixmessagestr is 10123=TICKET~}|167=CS~}|1=XTL9911~}|336=REG~}|10120= ~}|111=909~}|
how will I get XTL9911 using regexp_extract function. Need to get value for 1 Tag
I am using below and
select regexp_extract(fixmessagestr, '}1=(.*?)', 1) and it's giving null

In your string there is pipe | before tag- 1, not curly brace. Anyway both pipe and curly brace are special characters in regexp and should be shielded by double backslash (in Hive/Impala, in other databases it can be single backslash), also add end character, it is ~:
select regexp_extract('10123=TICKET~}|167=CS~}|1=XTL9911~}|336=REG~}|10120= ~}|111=909~}|', \
'\\|1=(.*?)~', 1)
Result:
XTL9911

Related

Athena/Presto Escape Underscore

I'm trying to escape an underscore in a like operator but not getting any results. I'm trying to find any rows with a value like 'aa_'.
WHERE value LIKE '%aa\\_%'
Use ESCAPE:
Wildcard characters can be escaped using the single character specified for the ESCAPE parameter.
WITH dataset (str) AS (
VALUES ('aa_1'),
('aa_2'),
('aa1')
)
SELECT *
FROM dataset
WHERE str like 'aa\_%' ESCAPE '\'
Output:
str
aa_1
aa_2

Snowflake SQL Regex

I am trying to identify a value that is nested in a string using Snowflakes regexp_substr()
The value that I want to access is in quotes:
...
Type:
value: "CategoryA"
...
Edit: This text is nested in a much larger portion of text.
I want to extract CategoryA for all columns using regexp_substr. But I am unsure how.
I have tried:
regexp_substr(col, 'Type\\W+(\\w+)\\W+\\w.+')
and while that gives the portion of the string, I just want what is in quotes and can't figure out how to do so.
You could use regexp_replace() instead:
regexp_replace(col, '(^[^"]*")|("[^"]*$)", '')
The regexp matches on both following conditions, and replaces matching parts with the empty string:
^[^"]*": everything from the beginning of the string to the first double quote
("[^"]*$)": everything from the last double quote to the end of the string

How can I replace a string pattern with blank in hive?

I have a string as:
https://maps.googleapis.com/maps/api/staticmap?center=41.892532+-87.63811&zoom=11&scale=2&size=280x320&maptype=roadmap&format=png&visual_refresh=true%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:1%7C2413+S+State+St++Chicago+IL+60616%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:2%7C3000+N+Halsted+St++Chicago+IL+60657%7C&markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&key=AIzaSyBNEAQcC5niAEeiP3zkA_nuWGvtl0IOEs4
I want to replace the '++++' pattern at the end with blank and not the single occurrence of '+'. Tried using regexp_replace and translate functions in hive but that replaces all the single occurrences of '+' as well.
Use
regexp_replace(string,'[+]{4}','')
Pattern '[+]{4}' means + caracter four times.
Test:
select regexp_replace('++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C++++&','[+]{4}','');
Result:
OK
++markers=size:mid%7Ccolor:0x8000ff%7Clabel:3%7C&
Dod you try this?
replace(string, '++++', '')
Admittedly, this will replace all occurrences of '++++', but your string only has one of them.

How to escape '|' in SQL Server or T-SQL Query?

SELECT *
FROM performance_table
WHERE ad_group like '%|%'
I have no idea on how to escape the Pipe operator here.
You don't need to escape | in T-SQL as it has no special meaning inside like. However, if for example you would like to find texts containing % character, what you're looking for is:
SELECT *
FROM performance_table
WHERE ad_group like '%#%%' escape '#'
where escape defines escape character.
The pipe character does not need to be escaped.
Your query will find all records that contain a pipe character in the ad_group column.
When used inside a string literal ('|'), the character is not treated as an operator. Its function as an operator is bitwise OR, as for example in
select 8|3
will be 11.

Teradata substring out of bounds

I'm having issues figuring out the bounds between a substring. For example for the string 063016_shape_tea_cleanse__emshptea1_I want to substring out emshptea1, but it also has to work for the string 063016_shape_tea_cleanse__emshptea1_TESTDATA_HERE.
Currently I have:
sel SUBSTR('063016_shape_tea_cleanse__emshptea1_',POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_')+2,
POSITION('_' IN SUBSTR('063016_shape_tea_cleanse__emshptea1_',POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_') + 2,CHARACTER_LENGTH('063016_shape_tea_cleanse__emshptea1_') - (POSITION('__' IN '063016_shape_tea_cleanse__emshptea1_') + 2)))-1)
But that is erroring out due to it trying to substring 27 to -1.
You might use a regular expression, this will extract everything between __ and the following _ or end of string:
REGEXP_SUBSTR(col, '(?<=__).+?(?=(_|$))')
'(?<= )' is a look-behind, i.e search for previous characters without adding it to the result. Here: search for __
'.+' matches any character, one or multiple times. This would match until the end of the string ("greedy"), '?' ("lazy") prevents that.
'(?= )' is a look-ahead, i.e. search for following characters without adding it to the result.
( | ) The pipe splits an expression in multiple alternatives. Here either an underscore character or the end of the string $