Hive Regular expression - only portion of string needed - sql

Hi i was trying to extract portion of data from one column in my hive table but the position of character is not in one place
select value4,regexp_extract(value4,'*****',0) from hive_table;
column value is shown below
grade:data:home made;Cat;dinnerbox_grade_Enroll
list:date:may;animal;dinnerbox_list_value
cgrade:made_data;dinnerbox_cgrade_notEnroll
I want data from dinnerbox to till end.
Can any one help on this?

It is a pretty simple regular expression
.*dinnerbox(.*?)$
Using a non-greedy wildcard, but forcing it to the end of the line makes sure that you always get the dinnerbox at the end.
You want capture group 1
To get rid of the _ you can use
.*dinnerbox_(.*?)$

Related

Get last occurrence for the string after '/' in redshift

I am fairly new to regex expressions and always had a trouble to follow. It would be really helpful if I can get answer to the following problem.
I have a column with strings in redshift table and want to extract a certain part of the string(The string that is after the last '/'). For example, I have https://hello.com/my_first_website in my redshift table with the column name as customer_site, from this I want to extract my_first_website as output. Can someone tell me a regex expression that can help me to extract this.
You can use regexp_substr function such as
SELECT regexp_substr('https://hello.com/my_first_website','[^/]*$')

Get multiple values with brackets from rows in SQL Server

I have rows containing data like this in column called ERROR_CODE:
00111[2003] Maschine0; 000222[2003] Maschinen2
I need to filter out only values in the brackets like this in one row:
2003;2003
I have one solution but only to get first element. And I would need all of them...like 2003,2003
SUBSTRING(ERROR_CODE,CHARINDEX('[',ERROR_CODE)+1 ,CHARINDEX(']',ERROR_CODE)-CHARINDEX('[',ERROR_CODE)-1)
Could you pease help me to find a solution?
This is based on several assumptions:
Each error is semicolon (;) delimited
An error always contains one value in brackets ([])
You are using a fully supported version of SQL Server.
One method to achieve this would be to string your string on the delimiter (;). Then you can find the position left bracket ([) and the right (]) and SUBSTRING to get the content between. Thing finally you can string aggregate to get 1 row (per value of your column) again:
SELECT STRING_AGG(SUBSTRING(V.YourColumn,CI.LB +1, CI.RB - CI.LB - 1),',')
FROM (VALUES('00111[2003] Maschine0; 000222[2003] Maschinen2'))V(YourColumn)
CROSS APPLY STRING_SPLIT(V.YourColumn,';') SS
CROSS APPLY (VALUES(CHARINDEX('[',V.YourColumn),CHARINDEX(']',V.YourColumn)))CI(LB,RB)
GROUP BY V.YourColumn;
For point 3, if you are not using a fully supported version of SQL Server you will need to use a user defined (set based or CLR) string splitter and FOR XML PATH respectively for splitting and aggregating your strings. If either 1 and 2 are not true, you have a far more fundamental problem with your design that your let on; fix your design.

Oracle SQL REGEXP_SUBSTR() and SUBSTR() issue

I am trying to modify a query that I have created to bring back only records that DO NOT start with the string 'PO:'
The records I want are all different in the way they start but none of the ones I want start with 'PO:'. Some may start with numbers or some may start with other words.
I know by using REGEXP_SUBSTR() or just SUBSTR() I can pull back data based solely on numbers or letters but how do I Not include certain words/strings.
Any help is appreciated!
you can use NOT
ex : yourcolumn not like 'PO%'
It accepts all values wich does not begin by 'PO'

SQL Server regular expression select and update

I have a column that I need to clean the data up on.
First I'd like to do a select to get a record of the bad data then I've like to run a replace on the invalid charters.
I'm looking to select anything that contains non alphanumeric characters but ignores the slash "\" as the second character and also ignores underscores and dashes in the rest of the string. Here's a couple of example of the data I'm expecting to get back from this query.
#\AAA
A\Adam's
A\Amanda.Smith
B\Bear's-ltd
C\Couple & More
After this I'd like to run a replace on any of these invalid characters and replace them with underscores so the result would look like this:
_\AAA
A\Adam_s
A\Amanda_Smith
B\Bear_s-ltd
C\Couple_More
I do not think there is native support for that. You can create a CLR to support regex, ex: https://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/

How do I check the end of a particular string using SQL pattern matching?

I am trying to use sql pattern matching to check if a string value is in the correct format.
The string code should have the correct format of:
alphanumericvalue.alphanumericvalue
Therefore, the following are valid codes:
D0030.2190
C0052.1925
A0025.2013
And the following are invalid codes:
D0030
.2190
C0052.
A0025.2013.
A0025.2013.2013
So far I have the following SQL IF clause to check that the string is correct:
IF #vchAccountNumber LIKE '_%._%[^.]'
I believe that the "_%" part checks for 1 or more characters. Therefore, this statement checks for one or more characters, followed by a "." character, followed by one or more characters and checking that the final character is not a ".".
It seems that this would work for all combinations except for the following format which the IF clause allows as a valid code:
A0025.2013.2013
I'm having trouble correcting this IF clause to allow it to treat this format as incorrect. Can anybody help me to correct this?
Thank you.
This stackoverflow question mentions using word-boundaries: [[:<:]] and [[:>:]] for whole word matches. You might be able to use this since you don't have spaces in your code.
This is ANSI SQL solution
This LIKE expression will find any pattern not alphanumeric.alphanumeric. So NOT LIKE find only this that match as you wish:
IF #vchAccountNumber NOT LIKE '%[^A-Z0-9].[^A-Z0-9]%'
However, based on your examples, you can use this...
LIKE '[A-Z][0-9][0-9][0-9][0-9].[0-9][0-9][0-9][0-9]'
...or one like this if you 5 alphas, dot, 4 alphas
LIKE '[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9].[A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9]'
The 2nd one is slightly more obvious for fixed length values. The 1st one is slighty less intuitive but works with variable length code either side of the dot.
Other SO questions Creating a Function in SQL Server with a Phone Number as a parameter and returns a Random Number and Best equivalent for IsInteger in SQL Server