SQL, extract everything before 5th comma - sql

For example, my column "tags" have
"movie/spiderman,genre/action,movie:marvel",
"movie/kingsman,genre/action",
"movie/spiderman,genre/action,movie:marvel,movie:dfjkl,movie:fskj,movie:aa,movie:mdkk"
I'm trying to return everything before 5th comma. below is the result example
"movie/spiderman,genre/action,movie:marvel",
"movie/kingsman,genre/action",
"movie/spiderman,genre/action,movie:marvel,movie:dfjkl,movie:fskj"
I've tried below code but it's not working.
select
NVL(SUBSTRING(tags, 1,REGEXP_INSTR(tags,',',1,5) -1),tags)
from myTable

You can use
REGEXP_REPLACE(tags, '^(([^,]*,){4}[^,]*).*', '\\1')
See the regex demo.
The REGEXP_REPLACE will find the occurrence of the following pattern:
^ - start of string
(([^,]*,){4}[^,]*) - Group 1 (\1 refers to this part of the match): four sequences of any zero or more chars other than a comma and a comma, and then zero or more chars other than a comma
.* - the rest of the string.
The \1 replacement restores Group 1 value in the resulting string.

Related

Extract substring with a specific pattern in Hive SQL

I have a column with this sample data. I need to extract all substring that starts with "M6". Is there a way to do it with regexp_extract?
Data Column
HEY01230328_M6K21SG_UNO_NYC_241
M6EW2BJ_UNO_NYC_251
M6HW2WL_UNO_NYC_251
HEY08460329_NA_M6LAB3D_UNO_NYC_241
Desired Output
M6K21SG
M6EW2BJ
M6HW2WL
M6LAB3D
Try using:
SELECT colname FROM tableName WHERE REGEXP_EXTRACT(colname, ".*(M6[^_]*).*",1)
Regex used:
.*(M6[^_]*).*
Regex Demo
Explanation:
.* - matches 0+ occurrences of any character that is not a newline character
(M6[^_]*) - matches M6 followed by 0+ occurrences of any character that is not a _. So, after M6, it keeps on matching everything until it finds the next _. The parenthesis is used to store this sub-match in Group 1
.* - matches 0+ occurrences of any character that is not a newline character

Regular expression - capture number between underscores within a sequence between commas

I have a field in a database table in the format:
111_2222_33333,222_444_3,aaa_bbb_ccc
This is format is uniform to the entire field. Three underscore separated numeric values, a comma, three more underscore separated numeric values, another comma and then three underscore separated text values. No spaces in between
I want to extract the middle value from the second numeric sequence, in the example above I want to get 444
In a SQL query I inherited, the regex used is ^.,(\d+)_.$ but this doesn't seem to do anything.
I've tried to identify the first comma, first number after and the following underscore ,222_ to use as a starting point and from there get the next number without the _ after it
This (,\d*_)(\d+[^_]) selects ,222_444 and is the closest I've gotten
We can try using REGEXP_REPLACE with a capture group:
SELECT
REGEXP_REPLACE(
'111_2222_33333,222_444_3,aaa_bbb_ccc',
'^[^,]+,[^_]+_(.*?)_[^_]+,.*$',
'\1') AS num
FROM yourTable;
Here is a demo showing that the above regex' first capture group contains the quantity you want.
Demo

Insert comma after every 7th character using regex and hive sql

Insert comma after every 7th character and make sure the data is having comma after every 7th character correctly using regex in hive sql.
Also to ignore the space while selecting the 7th character.
Sample Input Data:
12F123f, 123asfH 0DB68ZZ, AG12453
112312f, 1212sfH 0DB68ZZ, AQ13463
Output:
12F123f,123asfH,0DB68ZZ,AG12453
112312f,1212sfH,0DB68ZZ,AQ13463
I tried the below code, but it didn't work and insert the commas correctly.
select regexp_replace('12345 12456,12345 123', '(/(.{5})/g,"$1$")','')
I think you can use
select regexp_replace('12345 12456,12345 123', '(?!^)[\\s,]+([^\\s,]+)', ',$1')
See the regex demo
Details
(?!^) - no match if at string start
[\s,]+ - 1 or more whitespaces or commas
([^\s,]+) - Capturing group 1: one or more chars other than whitespaces and commas.
The ,$1 replacement replaces the match with a comma and the value in Group 1.
You just want to replace the empty char to ,, am I right? the SQL as below:
select regexp_replace('12F123f,123asfH 0DB68ZZ,AG12453',' ',',') as result;
+----------------------------------+--+
| result |
+----------------------------------+--+
| 12F123f,123asfH,0DB68ZZ,AG12453 |
+----------------------------------+--+

Delete specific pattern between commas in text file

I have thousand of SQL queries written over notepad++ line by line.Single line contain single SQL query.Every SQL query contain list of columns to be selected from database as comma separated values.Now we want certain columns not to be part of that list which follow a specific pattern/regular expression.The SQL query follows a specific pattern :
A trimmed column has been selected as alias 'PK'
Every query has got a 'dated'where condition at the end of it.
Sometimes the pattern which we wish to remove exist in either PK/where or both.we don't want to remove that column/pattern from those places.Just from the column selection list.
Below is the example of a SQL query :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA,TAE_RID_OWNER,TAE_FID_OWNER,TAE_CID_OWNER,TAE_TSP_REC_UPDATE from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
After removal of columns/patterns query should look like below :
select (TRIM(TAE_TSP_REC_UPDATE)) as PK,TAE_AMT_FAIR_MV,TAE_TXT_ACCT_NUM,TAE_CDE_OWNER_TYPE,TAE_DTE_AQA_ABA from TABLE_TAX_REP where DATE(TAE_TSP_REC_UPDATE)>='03/31/2018'
want to remove below patterns from each and every query between the commas :
.FID.
.RID.
.CID.
.TSP.
If the pattern exist within TRIM/DATE function it should not be touched.It should only be removed from column selection list.
Could somebody please help me regarding above.Thanks in advance
You may use
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$))(?:(?!\sfrom\s).)*?\K,?\s*[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+
Details
(?:\G(?!^)|\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$)) - two alternatives:
\G(?!^) - the end of the previous location, not a position at the start of the line
| - or
\sas\s(?=.*'\d{2}/\d{2}/\d{4}'$) - an as surrounded with single whitespaces that is followed with any 0+ chars other than line break chars and then ', 2 digits, /, 2 digits, /, 4 digits and ' at the end of the line
(?:(?!\sfrom\s).)*? - consumes any char other than a linebreak char, 0 or more repetitions, as few as possible, that does not start whitespace, from, whitespace sequence
\K - a match reset operator discarding all text matched so far
,?\s* - an optional comma followed with 0+ whitespaces
[A-Z_]+_(?:[FRC]ID|TSP)_[A-Z_]+ - ASCII letters or/and _, 1 or more occurrences, followed with _, then F, R or C followed with ID or TSP, then _, and again 1 or more occurrences of ASCII letters or/and _.
See the regex demo.

How to trim leading zeroes in comma separated values in Oracle?

I have a table CUSTOMER (ID,NAME,CORR_ID)
Sample Data -
ID|Name|Corr_ID
1 |John|0239,002319
2 |Mary|000466,00000000054667
Here is the output I need to see -
ID|Name|Corr_ID
1 |John|239,2319
2 |Mary|466,54667
regexp_replace(corr_id, '(,|^)0+', '\1')
will strip leading zeros without stripping zeros from 1000 or 2003.
The function looks for EITHER a comma OR the beginning of the string (^) - that is what (,|^) means - and then for one or more consecutive occurrences of the character '0'. It replaces each match with what was in the first pair of parentheses - that is, with the comma or the beginning of the string; it "zaps" all the leading zeros. You need to do it carefully, with the (,|^) - to make sure you don't drop zeros from 1000 or 2003.