Insert comma after every 7th character using regex and hive sql - sql

Insert comma after every 7th character and make sure the data is having comma after every 7th character correctly using regex in hive sql.
Also to ignore the space while selecting the 7th character.
Sample Input Data:
12F123f, 123asfH 0DB68ZZ, AG12453
112312f, 1212sfH 0DB68ZZ, AQ13463
Output:
12F123f,123asfH,0DB68ZZ,AG12453
112312f,1212sfH,0DB68ZZ,AQ13463
I tried the below code, but it didn't work and insert the commas correctly.
select regexp_replace('12345 12456,12345 123', '(/(.{5})/g,"$1$")','')

I think you can use
select regexp_replace('12345 12456,12345 123', '(?!^)[\\s,]+([^\\s,]+)', ',$1')
See the regex demo
Details
(?!^) - no match if at string start
[\s,]+ - 1 or more whitespaces or commas
([^\s,]+) - Capturing group 1: one or more chars other than whitespaces and commas.
The ,$1 replacement replaces the match with a comma and the value in Group 1.

You just want to replace the empty char to ,, am I right? the SQL as below:
select regexp_replace('12F123f,123asfH 0DB68ZZ,AG12453',' ',',') as result;
+----------------------------------+--+
| result |
+----------------------------------+--+
| 12F123f,123asfH,0DB68ZZ,AG12453 |
+----------------------------------+--+

Related

How do I insert a character every 2 spaces in a string?

I have a table with a 50 CHAR column, with a content like
AABB
AA
CCXXDD
It's a string used like an array of 25 elements CHAR 2.
I need to insert a comma every 2 characters
AA,BB
AA
CC,XX,DD
It there a system function or I need to create one?
We can do a regex replacement here:
SELECT col, RTRIM(REGEXP_REPLACE(col, '(..)', '\1,'), ',') AS col_out
FROM yourTable;
The above logic inserts a comma after every two characters. For inputs having an even number of characters, this leaves an unwanted dangling comma on the right, which we remove using RTRIM().

How to strip ending date from string using Regex? - SQL

I want to format the strings in a table column, in a specific format.
Input table:
file_paths
my-file-path/wefw/wefw/2022-03-20
my-file-path/wefw/2022-01-02
my-file-path/wef/wfe/wefw/wef/2021-02-03
my-file-path/wef/wfe/wef/
I want to remove everything after the last / sign, if the only thing after it resembles a date (i.e. YYYY-MM-dd or ####-##-##).
Output:
file_paths
my-file-path/wefw/wefw/
my-file-path/wefw/
my-file-path/wef/wfe/wefw/wef/
my-file-path/wef/wfe/wef/
I'm thinking of doing something like:
SELECT regexp_replace(file_paths, 'regex_here', '', 1, 'i')
FROM my_table
I'm unsure of how to write the RegEx for this though. I'm also open to easier methods of string manipulation, if there are any. Thanks in advance!
You can use
REGEXP_REPLACE ( file_paths, '/[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}$', '/' )
See the regex demo.
The /[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}$ is a POSIX ERE compliant pattern matching
/ - a slash
[0-9]{4} - four digits
- - a hyphen
[0-9]{1,2} - one or two digits
-[0-9]{1,2} - a hyphen and one or two digits
$ - end of string.
If your values can contain trailing whitespace, insert [[:space:]]* before $: /[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}[[:space:]]*$.
You may use this regex:
^(.*?\/)\d{4}-\d{2}-\d{2}$
You may try this query:
select regexp_replace(file_paths, '^(.*?\/)\\d{4}-\\d{2}-\\d{2}$','$1')
Demo

SQL, extract everything before 5th comma

For example, my column "tags" have
"movie/spiderman,genre/action,movie:marvel",
"movie/kingsman,genre/action",
"movie/spiderman,genre/action,movie:marvel,movie:dfjkl,movie:fskj,movie:aa,movie:mdkk"
I'm trying to return everything before 5th comma. below is the result example
"movie/spiderman,genre/action,movie:marvel",
"movie/kingsman,genre/action",
"movie/spiderman,genre/action,movie:marvel,movie:dfjkl,movie:fskj"
I've tried below code but it's not working.
select
NVL(SUBSTRING(tags, 1,REGEXP_INSTR(tags,',',1,5) -1),tags)
from myTable
You can use
REGEXP_REPLACE(tags, '^(([^,]*,){4}[^,]*).*', '\\1')
See the regex demo.
The REGEXP_REPLACE will find the occurrence of the following pattern:
^ - start of string
(([^,]*,){4}[^,]*) - Group 1 (\1 refers to this part of the match): four sequences of any zero or more chars other than a comma and a comma, and then zero or more chars other than a comma
.* - the rest of the string.
The \1 replacement restores Group 1 value in the resulting string.

Regular expression - capture number between underscores within a sequence between commas

I have a field in a database table in the format:
111_2222_33333,222_444_3,aaa_bbb_ccc
This is format is uniform to the entire field. Three underscore separated numeric values, a comma, three more underscore separated numeric values, another comma and then three underscore separated text values. No spaces in between
I want to extract the middle value from the second numeric sequence, in the example above I want to get 444
In a SQL query I inherited, the regex used is ^.,(\d+)_.$ but this doesn't seem to do anything.
I've tried to identify the first comma, first number after and the following underscore ,222_ to use as a starting point and from there get the next number without the _ after it
This (,\d*_)(\d+[^_]) selects ,222_444 and is the closest I've gotten
We can try using REGEXP_REPLACE with a capture group:
SELECT
REGEXP_REPLACE(
'111_2222_33333,222_444_3,aaa_bbb_ccc',
'^[^,]+,[^_]+_(.*?)_[^_]+,.*$',
'\1') AS num
FROM yourTable;
Here is a demo showing that the above regex' first capture group contains the quantity you want.
Demo

How to return a substring after a colon and before a whitespace in Oracle SQL

I need to get a substring from a table column that is after a colon and before a whitespace. The length of the substring can vary, but the length of the data before the colon and after the whitespace is constant.
So the data in my table column named "Subject" consists of 5 words, immediately followed by a colon, immediately followed by the substring I need (which can vary in length), followed by a whitespace and a date. The substring I need is a course name. Examples:
Payment Due for Upcoming Course:FIN/370T 11/26/2019
Payment Due for Upcoming Course:BUS/475 11/26/2019
Payment Due for Upcoming Course:ADMIN9/475TG 11/26/2019
I have tried using REGEXP function with REGEXP_SUBSTR(COLUMN_NAME,'[^:]+$') to get everything after the colon, and REGEXP_SUBSTR(COLUMN_NAME, '[^ ]+' , 1 , 5 ) to get data before the last whitespace, but I need to combine them.
I have tried the following:
select
REGEXP_SUBSTR(SUBJECT,'[^:]+$') COURSE_ID
from TABLE
Result:
FIN/370T 11/26/2019
and this:
select
REGEXP_SUBSTR (SUBJECT, '[^ ]+' , 1 , 5 ) COURSE_ID2
from TABLE
Result:
Course:FIN/370T
I need the output to return FIN/370T
In short use:
select regexp_replace(str,'(.*:)(.*)( )(.*)$','\2') as short_course_id
from tab
I prefer regexp_replace, because there are more possibilities to extract part of strings.
If you don't want to mess with regex, you can use a combo of substr and instr.
select
substr(part1,1,instr(part1, ' ',-1,1) ) as course,
part1
from (
select
substr(<your column>,instr(<your column>,':',1,1) +1) as part1
from
<your table>
) t
Fiddle
One option would be
select replace(regexp_substr(str,'[^:]+$'),
regexp_substr(str,'[^:][^ ]+$'),'') as course_id
from tab
Demo
where first regexp_substr() extracts the substring starting from the colon to the end, and the second one from the last space to the end.