Redshift skip the first character of split_part() - sql

I have a table column like below:
| cloumn_a |
| ------------------ |
| Alpha_Black_1 |
| Alpha_Black_2323 |
| Alpha_Red_100 |
| Alpha_Blue_2344 |
| Alpha_Orange_33333 |
| Alpha_White_2 |
| |
Usually, when I want to split with any symbol or character I am using the split_part(text, text, integer) so split_part(column_a, '_', 1)
I need to remove the numeric part of each variable and keep only the text part like Alpha_Black.
I cannot use the trim function because the numeric part can change
How can I skip the first underscore and split from the second one?

I would suggest using REGEXP_REPLACE here:
SELECT
column_a,
REGEXP_REPLACE(column_a, '_\\d+$', '') AS column_a_out
FROM yourTable;
Demo

Related

Check string for substring existence

How can I check whether a certain substring (for instance 18UT) is part of a string in a column?
Redshifts' SUBSTRING function allows me to "cut" a certain substring based on a starting index + length of the subtring, but not check whether a specific substring exists is in the column's value.
Example:
+------------------+
| col |
+------------------+
| 14TH, 14KL, 18AB |
| 14LK, 18UT, 15AK |
| 14AB, 08ZT, 18ZH |
| 14GD, 52HG, 18UT |
+------------------+
Desired result:
+------------------+------+
| col | 18UT |
+------------------+------+
| 14TH, 14KL, 18AB | No |
| 14LK, 18UT, 15AK | Yes |
| 14AB, 08ZT, 18ZH | No |
| 14GD, 52HG, 18UT | Yes |
+------------------+------+
Here is one option:
select col,
case when ', ' || col || ', ' like '%, 18UT, %' then 'yes' else 'no' end has_18ut
from mytable
While this will solve your immediate, problem, it should be note that storing delimited lists in a database table is bad practice, and should be avoided. Each value should go to a separate row instead.

Reverse Split_part SQL

Is there a way to there a way to delimit data and select the 2nd to last substring
Sample Input:
*------------------------------------*
| Name |
*------------------------------------*
|Mike__NYC_180x9_School |
|Oak_Ann_1_LA_1x190_Uni |
|Tiger_King_Al_car_12_10x15_sample |
*------------------------------------*
Desired Output:
*--------------*
|Account number|
*--------------*
|180x9 |
|1x190 |
|10x15 |
*--------------*
reverse the string then take the second word and then reverse it again.
select reverse(split_part(reverse(Name),'_',2));

SQL padding 0 to the left of a number in string

I am a beginner in SQL language and I am using postgre sql and doing little exercices to learn. I have a column of strings named acronym from a destination table:
DO1
ES1
ES2
FR1
FR10
FR2
FR3
FR4
FR5
FR6
FR7
FR8
FR9
GP1
GP2
IN1
IN2
MU1
RU1
TR1
UA1
I would like to add a padding zero for acronym numbers that have only one digit, output:
DO01
ES01
ES02
FR01
FR02
FR03
FR04
FR05
FR06
FR07
FR08
FR09
FR10
GP01
GP02
IN01
IN02
MU01
RU01
TR01
UA01
How can I get to the left of the first number in the string? There is some regex I think but I did not figure it out
You can use the rpad() function to add characters to the end of the value:
select rpad(col, '0', 4)
In your case, though, you want a value in-between. On simple method is -- assuming that the first two characters are strings -- is:
(case when length(col) = 3
then left(col, 2) || '0' || right(col, 1)
else col
end)
Another possibility is using regexp_replace():
regexp_replace(col, '^([^0-9]{2})([0-9])$', '\10\2')
Both of these assume that the strings to be padded are three characters, which is consistent with your data. It is unclear what you want for other lengths.
try with below:
to_char() function
select to_char(column1, 'fm000') as column2
from Test_table;
fm "fill mode"prefix avoids leading spaces in the resulting var char.
000 it defines the number of digits you want to have.
You can use string functions like lpad(), substr(), left():
select
concat(left(columnname, 2), lpad(substr(columnname, 3), 2, '0')) result
from tablename
See the demo.
Results:
| result |
| ------ |
| DO01 |
| ES01 |
| ES02 |
| FR01 |
| FR10 |
| FR02 |
| FR03 |
| FR04 |
| FR05 |
| FR06 |
| FR07 |
| FR08 |
| FR09 |
| GP01 |
| GP02 |
| IN01 |
| IN02 |
| MU01 |
| RU01 |
| TR01 |
| UA01 |

hive regexp_extract after second occurrence of delimiter

we have a Hive table column which has string separated by ';' and we need to extract the string after second occurrence of ';'
+-----------------+
| col1 |
+-----------------+
| a;b;c;d |
| e;f; ;h |
| i;j;k;l |
+-----------------+
Required output:
+-----------+
| col1 |
+-----------+
| c |
| <null> |
| k |
+-----------+
select regexp_extract
Split the string on ; which will return an array of values and from this you can get the element at index 2.
select split(str,';')[2]
from tbl
If you want to convert empty and space-only strings to NULLs like in your example, then this macro can be useful:
create temporary macro empty_to_null(s string) case when trim(s)!='' then s end;
select empty_to_null(split(col1,'\\;')[2]);

Oracle SQL regex extraction

I have data as follows in a column
+----------------------+
| my_column |
+----------------------+
| test_PC_xyz_blah |
| test_PC_pqrs_bloh |
| test_Mobile_pqrs_bleh|
+----------------------+
How can I extract the following as columns?
+----------+-------+
| Platform | Value |
+----------+-------+
| PC | xyz |
| PC | pqrs |
| Mobile | pqrs |
+----------+-------+
I tried using REGEXP_SUBSTR
Default first pattern occurrence for platform:
select regexp_substr(my_column, 'test_(.*)_(.*)_(.*)') as platform from table
Getting second pattern occurrence for value:
select regexp_substr(my_column, 'test_(.*)_(.*)_(.*)', 1, 2) as value from table
This isn't working, however. Where am I going wrong?
For Non-empty tokens
select regexp_substr(my_column,'[^_]+',1,2) as platform
,regexp_substr(my_column,'[^_]+',1,3) as value
from my_table
;
For possibly empty tokens
select regexp_substr(my_column,'^.*?_(.*)?_.*?_.*$',1,1,'',1) as platform
,regexp_substr(my_column,'^.*?_.*?_(.*)?_.*$',1,1,'',1) as value
from my_table
;
+----------+-------+
| PLATFORM | VALUE |
+----------+-------+
| PC | xyz |
+----------+-------+
| PC | pqrs |
+----------+-------+
| Mobile | pqrs |
+----------+-------+
(.*) is greedy by nature, it will match all character including _ character as well, so test_(.*) will match whole of your string. Hence further groups in pattern _(.*)_(.*) have nothing to match, whole regex fails. The trick is to match all characters excluding _. This can be done by defining a group ([^_]+). This group defines a negative character set and it will match to any character except for _ . If you have better pattern, you can use them like [A-Za-z] or [:alphanum]. Once you slice your string to multiple sub strings separated by _, then just select 2nd and 3rd group.
ex:
SELECT REGEXP_SUBSTR( my_column,'(([^_]+))',1,2) as platform, REGEXP_SUBSTR( my_column,'(([^_]+))',1,3) as value from table;
Note: AFAIK there is no straight forward method to Oracle to exact matching groups. You can use regexp_replace for this purpose, but it unlike capabilities of other programming language where you can exact just group 2 and group 3. See this link for example.