Add a character in a string at certain location based on logic in SQL Server - sql

I have comma separated data like this in one of the column
48FGTG,100ERTD,18NH,07EWR,9FDC,2POANAR,100GTEDC
46FGTG,78ERTD,67NH,76EWR,3FDC
The numbers in the starting is percentage, whatever comes after the first alphabetic character is percentage, it varies from 0-100.
I have to update the data like
48% FGTG,100% ERTD,18% NH,07% EWR,9% FDC,2% POANAR,100% GTEDC
46% FGTG,78% ERTD,67% NH,76% EWR,3% FDC
I can filter out the percentile in regex, but not sure using it in SQL. Any lead would be helpful.

You can do it like
select STRING_AGG(substring(value,0,PATINDEX('%[^0-9]%',value))+'%'+substring(value,PATINDEX('%[^0-9]%',value),len(value)),',') from string_split('48FGTG,100ERTD,18NH,07EWR,9FDC,2POANAR,100GTEDC
46FGTG,78ERTD,67NH,76EWR,3FDC',',')
Here's what I have done
1.Use PATINDEX to find the first occurrence of character
2.Use substring function to extract the first number and then remaining string
3.Use STRING_AGG to concatenates the values of string expressions and places separator values between them

Related

Oracle SQL regex: extract every instance of a string and preceding/following characters

I'm pulling data from an Oracle CLOB field containing tens of thousands of characters. The data look like this:
...
196|9900000296567|V|
197|S05S53499|D|
198|TO|20170128000000|50118.0|||T|N|
196|9900009777884|V|
197|H02FC07599|D|
198|01|20170128000000|64452.0|||T|N|
198|02|20170128000000|14235.0|||T|N|
196|9900014386487|V|
197|S10C20599|D|
198|1|20170128000000|6246.0|||T|N|
196|9900015184256|V|
197|S13G44199|D|
198|L|20170128000000|1731.0|||T|N|
198|N|20170128000000|5915.0|||T|N|
196|9900018826270|V|
197|S10C20599|D|
198|01|20170128000000|3678.0|||T|N|
198|02|20170128000000|25286.0|||T|N|
...
I want to extract every occurrence of a string (e.g. S10C20599) with the preceding 25 characters and following 75 characters. If this bit is not possible I'd happily settle for the same number of preceding and following characters. I don't care if I get overlaps in the extracted data, and the code should not error if the search string occurs <25 characters from the beginning of the file or <75 characters from the end.
Thanks for any tips.
If there is only one value, you can use:
select regexp_substr(col, '.{0-25}S10C20599.{0-75}')
Otherwise, you need to some sort of recursive or hierarchical query to fetch multiple values from a single string.

How can i find two extra characters in DB2 and list down those in a column?

I have written this expression for checking extra characters and I am counting the occurrence of those extra characters.
REGEXP_COUNT('Mr.John® Êlite', regexp_extract ('Mr.John® Êlite','[^\x00-\x7F]'))
It's working fine if the string has only one extra character e.g
Mr. John®
It will take out ® and give me count as 1.
But if my string has two extra characters, it will only pick the first one and ignore the second character e.g
Mr.John® Êlite
My function will extract ® and ignore Ê.
I have tried subquery as well.Not working.Need help
As noted by Wiktor Stribiżew REGEXP_COUNT needs just a source string and regexp:
db2 "values REGEXP_COUNT('Mr.John® Êlite', '[^\x00-\x7F]')"
1
-----------
2
Because you used REGEXP_EXTRACT, it does extract the first occurrence only:
The REGEXP_EXTRACT scalar function returns one occurrence of a substring of a string that matches the regular expression pattern.
and only then you do actual count.

Regular expression - capture number between underscores within a sequence between commas

I have a field in a database table in the format:
111_2222_33333,222_444_3,aaa_bbb_ccc
This is format is uniform to the entire field. Three underscore separated numeric values, a comma, three more underscore separated numeric values, another comma and then three underscore separated text values. No spaces in between
I want to extract the middle value from the second numeric sequence, in the example above I want to get 444
In a SQL query I inherited, the regex used is ^.,(\d+)_.$ but this doesn't seem to do anything.
I've tried to identify the first comma, first number after and the following underscore ,222_ to use as a starting point and from there get the next number without the _ after it
This (,\d*_)(\d+[^_]) selects ,222_444 and is the closest I've gotten
We can try using REGEXP_REPLACE with a capture group:
SELECT
REGEXP_REPLACE(
'111_2222_33333,222_444_3,aaa_bbb_ccc',
'^[^,]+,[^_]+_(.*?)_[^_]+,.*$',
'\1') AS num
FROM yourTable;
Here is a demo showing that the above regex' first capture group contains the quantity you want.
Demo

Hive SQL Extract string of varying length between two non-alphanumeric characters

I would like to extract strings of varying length located between two repeating underscores in Hive QL. Below I show a sampling of the pattern of the rows. Specifically, I would like to extract the string between the 3rd and 4th underscores. Thanks!
2016_sadfsa_IL_THIS_xsdaf_asd_eventbyevent_tsaC_NA_300x250
2017_thisshopper_MA_THIS_NAT_Leb_ReasonsWhy_HDIMC_NA_300x600
2017_FordShopper_IL_THESE_NAT_sov_winterEvent_HDIMC_NA_300x600
Just kept trying and I modified this from previous responses to non-Hive SQL. I am still interested in knowing better ways of doing this. Note that creative_str is the name of the column:
select creative_str, ltrim(rtrim(substring(regexp_replace(cast(creative_str as varchar(1000)), '_', repeat(cast(' ' as varchar(1000)),10000)), 30001, 10000)))
from impression_cr
You should be able to do this with Hive's SPLIT() function. If you're trying to grab the value between the third and fourth underscores, this will do it:
SELECT SPLIT("2016_sadfsa_IL_THIS_xsdaf_asd_eventbyevent_tsaC_NA_300x250", "[_]")[3],
SPLIT("2017_thisshopper_MA_THIS_NAT_Leb_ReasonsWhy_HDIMC_NA_300x600", "[_]")[3],
SPLIT("2017_FordShopper_IL_THESE_NAT_sov_winterEvent_HDIMC_NA_300x600", "[_]")[3]

Display certain sequence only in VARCHAR

I have a column error_desc with values like:
Failure occurred in (Class::Method) xxxxCalcModule::endCustomer. Fan id 111232 is not Effective or not present in BL9_XXXXX for date 20160XXX.
What SQL query can I use to display only the number 111232 from that column? The number is placed at 66th position in VARCHAR column and ends 71st.
SELECT substr(ERROR_DESC,66,6) as ABC FROM bl1_cycle_errors where error_desc like '%FAN%'
This solution uses regular expressions.
The challenge I faced was on pulling out alphanumerics. We have to retain only numbers and filter out string,alphanumerics or punctuations in this case, to detect the standalone number.
Pure strings and words not containing numbers can be easily filtered out using
[^[:digit:]]
Possible combinations of alphanumerics are :
1.Begins with a character, contains numbers, may end with characters or punctuations :
[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*
2.Begins with numbers and then contains alphabets,may contain punctuations :
[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*
Begins with numbers then contains punctuations,may contain alphabets :
-- [0-9]+[a-zA-Z][[:punct:]]+[a-zA-Z] --Not able to highlight as code, refer solution's last regex combination
Combining these regular expressions using | operator we get:
select trim(REGEXP_REPLACE(error_desc,'[^[:digit:]]|[a-zA-Z]+[0-9]+[[:punct:]]*[a-zA-Z]*[[:punct:]]*|[0-9]+[[:punct:]]*[a-zA-Z]+[[:punct:]]*|[0-9]+[a-zA-Z]*[[:punct:]]+[a-zA-Z]*',' '))
from error_table;
Will work in most cases.