I have data in table:
id
question
1
1.1 Covid-19 [cases]
2
1.1 Covid-19 [deaths]
I want to split the data into columns. To get below output:
id
questionid
question_name
sub_question_name
1
1.1
Covid-19
cases
2
1.1
Covid-19
deaths
Is any function to get above output.?
One way of doing this is using the much useful PostgreSQL SPLIT_PART function, which allows you to split on a character (in your specific case, the space). As long as you don't need brackets for the last field, you may split on the open bracket and remove the last bracket with the RTRIM function.
SELECT id,
SPLIT_PART(question, ' ', 1) AS questionid,
SPLIT_PART(question, ' ', 2) AS question_name,
RTRIM(SPLIT_PART(question, '[', 2), ']') AS sub_question_name
FROM tab
Check the demo here.
You can deepen your understanding of these functions on PostgreSQL official documentation related to the string functions.
EDIT: For a more advanced matching, you should consider using regex and PostgreSQL pattern matching:
SELECT id,
(REGEXP_MATCHES(question, '^[\d\.]+'))[1] AS questionid,
(REGEXP_MATCHES(question, '(?<= )[^[]+'))[1] AS question_name,
(REGEXP_MATCHES(question, '(?<=\[).*(?=\]$)'))[1] AS sub_question_name
FROM tab
Regex for questionid Explanation:
^: start of string
[\d\.]+: any existing combination of digit and dots
Regex for question_name Explanation:
(?<= ): positive lookbehind that matches a space before the match
[^[]+: any existing combination of any character other than [
Regex for sub_question_name Explanation:
(?<=\[): positive lookbehind that matches an open bracket before the match
.*: any character
(?=\]$): positive lookahead that matches a closed bracket after the match
Check the demo here.
You can also use regexp_replace, in this example, the regexp_replace will replace the square brackets (first and third groups) group 1 -> ^(\[), group 3 -> (\])$ by the second group (.*).
the third argument \2 in the end of the function indicates what group should remain in the text.
select
id,
split_part(question, ' ', 1) p1,
split_part(question, ' ', 2) p2,
regexp_replace(split_part(question, ' ', 3), '^(\[)(.*)(\])$', '\2') p3
from
covid;
Here is the example
Related
Given a field with combinations of letters and numbers, is there a way to get the last (Rightmost) pair of letters (2 letters) in SQL?
SAMPLE DATA
RT34-92837DF82982
DRE3-9292928373DO
FOR THOSE, I would want
DF and
DO
For clarity, there will only be numbers after these letters.
Edits
This is for SQL Server.
I would remove any characters that aren't letters, using REGEXP_REPLACE or similar function based on your DBMS.
regexp_replace(col1, '[^a-zA-Z]+', '')
Then use a RIGHT or SUBSTRING function to select the "right-most".
right(regexp_replace(col1, '[^a-zA-Z]+', ''), 2)
substring(regexp_replace(col1, '[^a-zA-Z]+', ''),len(regexp_replace(col1, '[^a-zA-Z]+', ''))-2,len(regexp_replace(col1, '[^a-zA-Z]+', ''))
If you can have single occurrences of letters ('DF1234A124') then could change the regex pattern to remove those also - ([^a-zA-Z][a-zA-Z][^a-zA-Z])|[^a-zA-Z]
As you said, there will only be numbers after these letters, you can use the Trim and Right functions as the following:
select
Right(Trim('0123456789' from val), 2) as res
from t
Note: This is valid from SQL Server 2017.
For older versions try the following:
select
Left
(
Right(val, PATINDEX('%[A-Z]%', Reverse(val))+1),
2
) as res
from t
See demo
Community,
I need assistance with removing the UNDER SCORES '_' and make the name readable first name letter UpperCase last name UpperCase, while removing the number as well. Hope this makes sense. I am running Presto and using Query Fabric. I there a better way to write this syntax?
Email Address
Full_Metal_Jacket#movie.com
TOP_GUN2#movie.email.com
Needed Outcome
Full Metal Jacket
Top Gun
Partical working Resolution:
,REPLACE(SPLIT_PART(T.EMAIL, '#', 1),'_',' ') Name
Something like this:
,LOWER(REPLACE(UPPER(SPLIT_PART(T.EMAIL, '#', 1)),'_',' '))Name
Try this:
WITH t(email) AS (
VALUES 'Full_Metal_Jacket#movie.com', 'TOP_GUN2#movie.email.com'
)
SELECT array_join(
transform(
split(regexp_extract(email, '(^[^0-9#]+)', 1), '_'),
part -> upper(substr(part, 1, 1)) || lower(substr(part, 2))),
' ')
FROM t;
How it works:
extract the non-numeric prefix up to the # using a regex via regexp_extract
split the prefix on _ to produce an array
transform the array by capitalizing the first letter of each element and lowercasing the rest.
Finally, join them all together with a space using the array_join function.
Update:
Here's another variant without involving transform and the intermediate array:
regexp_replace(
replace(regexp_extract(email, '(^[^0-9#]+)', 1), '_', ' '),
'(\w)(\w*)',
x -> upper(x[1]) || lower(x[2]))
Like the approach above, it first extracts the non-numeric prefix, then it replaces underscores with spaces with the replace function, and finally, it uses regexp_replace to process each word. The (\w)(\w*) regular expression captures the first letter of the word and the rest of the word into two separate capture groups. The x -> upper(x[1]) || lower(x[2]) lambda expression then capitalizes the first letter (first capture group -- x[1]) and lower cases the rest (second capture group -- x[2]).
I have a table in BigQuery:
ab_col_jfsfhfd_ggg_sdf
arfd_am_fdsf_fddg_fg
d_fdf_fdddg_ffddd_f
I would like to extract those characters that go right after the first _ character and followed by the second _ character. I want to get the following:
col
am
fdf
I used the following regular expression to extract the characters but it does not work as intended:
^.*\_(\D+)\_.*$
regexp_replace(id,'^.*\\_(\\D+)\\_.*$' , '\\1')
Please help!
If I follow you correctly, you can use split():
(split(col, '_'))[safe_ordinal(2)]
split() turns the string column to an array of values, given a separator (here, we use _). Then we can just grab second array element.
split() is a very simply way of solving this. But regular expressions are also quite simple:
with t as (
select 'ab_col_jfsfhfd_ggg_sdf' as id union all
select 'arfd_am_fdsf_fddg_fg' union all
select 'd_fdf_fdddg_ffddd_f'
)
select id, regexp_extract(id, '[^_]+', 1, 2)
from t;
The logic for the pattern is: "Look for any string of characters that is not an underscore. Then take the second one in the string."
Use regexp_extract:
regexp_extract(id,'^[^_]+_([^_]+)')
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^_]+ any character except: '_' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^_]+ any character except: '_' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
I need to get a substring from a table column that is after a colon and before a whitespace. The length of the substring can vary, but the length of the data before the colon and after the whitespace is constant.
So the data in my table column named "Subject" consists of 5 words, immediately followed by a colon, immediately followed by the substring I need (which can vary in length), followed by a whitespace and a date. The substring I need is a course name. Examples:
Payment Due for Upcoming Course:FIN/370T 11/26/2019
Payment Due for Upcoming Course:BUS/475 11/26/2019
Payment Due for Upcoming Course:ADMIN9/475TG 11/26/2019
I have tried using REGEXP function with REGEXP_SUBSTR(COLUMN_NAME,'[^:]+$') to get everything after the colon, and REGEXP_SUBSTR(COLUMN_NAME, '[^ ]+' , 1 , 5 ) to get data before the last whitespace, but I need to combine them.
I have tried the following:
select
REGEXP_SUBSTR(SUBJECT,'[^:]+$') COURSE_ID
from TABLE
Result:
FIN/370T 11/26/2019
and this:
select
REGEXP_SUBSTR (SUBJECT, '[^ ]+' , 1 , 5 ) COURSE_ID2
from TABLE
Result:
Course:FIN/370T
I need the output to return FIN/370T
In short use:
select regexp_replace(str,'(.*:)(.*)( )(.*)$','\2') as short_course_id
from tab
I prefer regexp_replace, because there are more possibilities to extract part of strings.
If you don't want to mess with regex, you can use a combo of substr and instr.
select
substr(part1,1,instr(part1, ' ',-1,1) ) as course,
part1
from (
select
substr(<your column>,instr(<your column>,':',1,1) +1) as part1
from
<your table>
) t
Fiddle
One option would be
select replace(regexp_substr(str,'[^:]+$'),
regexp_substr(str,'[^:][^ ]+$'),'') as course_id
from tab
Demo
where first regexp_substr() extracts the substring starting from the colon to the end, and the second one from the last space to the end.
I have a string that appears as:
00012345678 Rain, Kip
I would like to filter out the first numbers/integers, then re-arrange the first and last name.
Kip Rain
I was thinking that I could do INSTR({string},',','1') to get to the first comma, but I am unsure how to do both numbers and punctuation in one line. Would I have to chain the INSTR?
Thanks for your help!
You can chain them; but with complicated things this quickly becomes confusing to work out what's happening. Unless you have demonstrable performance concerns it's often quicker to use regular expressions. In this case, it's probably easiest to use REGEXP_REPLACE()
select regexp_replace(your_string
, '[^[:alpha:]]+([[:alpha:]]+)[^[:alpha:]]+([[:alpha:]]+)'
, '\2 \1')
from ...
The second parameter is the match string; in this case we're searching for everything that is not an alphabetic character ([^[:alpha:]]) 1 or more times (+), followed by alphabetic characters ([[:alpha:]]) 1 or more times. This is repeated to take into account the spaces and comma; and would match your string as follows:
|string | matched by |
+--------------+----------------+
|'00012345678 '| [^[:alpha:]]+ |
|'Rain' | ([[:alpha:]]+) |
|', ' | [^[:alpha:]]+ |
|'Kip' | ([[:alpha:]]+) |
The parenthesis here represent groups; the first set the first group etc...
The third parameter of REGEXP_REPLACE() tells Oracle what to replace your string with; this where the groups come in - you can replace groups in any order. In this instance I want the second group (Kip), followed by a space, followed by the first group (Rain).
You can see this all demonstrated in this SQL Fiddle
Yes, it is alright to chain them:
substr(str, 1, instr(str, ' ')) number_part
substr(str, instr(str, ' '), instr(str, ',') - instr(str, ' ')) Kip
substr(str, instr(str, ' ', 2), len(str)) Rain
In last example you may use something more preceise than len(str) if your string is longer.
I am biased towards using the regular expression variation of the substr function.
First obtain a repeating list of non-numeric characters as follows:
REGEXP_SUBSTR('00012345678 Rain, Kip','([[:alpha:]]|[-])+',1,1)
where [[:alpha:]] is a character class where all alphabetic characters are included.
The bracketed expression, [-], is just a matching list which is my way of identifying that the last name, Rain, could include a hyphen. The alternation operator, '|', states that either the alphabetic or hyphen characters are acceptable.
The '+' indicates that we are looking to match one or more occurrences.
Second, obtain the last non-numeric characters at the end of the string:
REGEXP_SUBSTR('00012345678 Rain, Kip','[^, ]+$',1,1)
Here, I am going to the end of the string (using the anchor, '$'), and find all character after the comma and space.
Next I combine (with a space in between) using the concatenator operator, ||.
REGEXP_SUBSTR('00012345678 Rain, Kip','[^, ]+$',1,1) ||' ' || REGEXP_SUBSTR('00012345678 Rain, Kip','([[:alpha:]]|[ -])+',1,1)