hive regexp_extract after second occurrence of delimiter - hive

we have a Hive table column which has string separated by ';' and we need to extract the string after second occurrence of ';'
+-----------------+
| col1 |
+-----------------+
| a;b;c;d |
| e;f; ;h |
| i;j;k;l |
+-----------------+
Required output:
+-----------+
| col1 |
+-----------+
| c |
| <null> |
| k |
+-----------+
select regexp_extract

Split the string on ; which will return an array of values and from this you can get the element at index 2.
select split(str,';')[2]
from tbl

If you want to convert empty and space-only strings to NULLs like in your example, then this macro can be useful:
create temporary macro empty_to_null(s string) case when trim(s)!='' then s end;
select empty_to_null(split(col1,'\\;')[2]);

Related

Check string for substring existence

How can I check whether a certain substring (for instance 18UT) is part of a string in a column?
Redshifts' SUBSTRING function allows me to "cut" a certain substring based on a starting index + length of the subtring, but not check whether a specific substring exists is in the column's value.
Example:
+------------------+
| col |
+------------------+
| 14TH, 14KL, 18AB |
| 14LK, 18UT, 15AK |
| 14AB, 08ZT, 18ZH |
| 14GD, 52HG, 18UT |
+------------------+
Desired result:
+------------------+------+
| col | 18UT |
+------------------+------+
| 14TH, 14KL, 18AB | No |
| 14LK, 18UT, 15AK | Yes |
| 14AB, 08ZT, 18ZH | No |
| 14GD, 52HG, 18UT | Yes |
+------------------+------+
Here is one option:
select col,
case when ', ' || col || ', ' like '%, 18UT, %' then 'yes' else 'no' end has_18ut
from mytable
While this will solve your immediate, problem, it should be note that storing delimited lists in a database table is bad practice, and should be avoided. Each value should go to a separate row instead.

Remove/delete values in a column SQL

I am very new to using SQL and require help.
I have a table containing comma in the values
+-------------------+
| Sample |
+-------------------+
| sdferewr,yyuyuy |
| q45345,ty67rt |
| wererert,rtyrtytr |
| werr,ytuytu |
+-------------------+
I would want to delete/remove the values after the comma(,) and keep only those values before it.
Output required.
+----------+
| Sample |
+----------+
| sdferewr |
| q45345 |
| wererert |
| werr |
+----------+
How would I be able to do this in SQL? please help
Assuming that the table name is "TABLE_NAME" and the field name is "sample". Then
update TABLE_NAME set sample=SUBSTRING_INDEX(`sample`, ',', 1)
The most simple way to do that is
UPDATE table_name
SET column = substring(column for position('',' in column))
WHERE condition;
position(',' in column) will return the position of the comma and substring(column for n) returns the first n characters

Redshift skip the first character of split_part()

I have a table column like below:
| cloumn_a |
| ------------------ |
| Alpha_Black_1 |
| Alpha_Black_2323 |
| Alpha_Red_100 |
| Alpha_Blue_2344 |
| Alpha_Orange_33333 |
| Alpha_White_2 |
| |
Usually, when I want to split with any symbol or character I am using the split_part(text, text, integer) so split_part(column_a, '_', 1)
I need to remove the numeric part of each variable and keep only the text part like Alpha_Black.
I cannot use the trim function because the numeric part can change
How can I skip the first underscore and split from the second one?
I would suggest using REGEXP_REPLACE here:
SELECT
column_a,
REGEXP_REPLACE(column_a, '_\\d+$', '') AS column_a_out
FROM yourTable;
Demo

Reverse Split_part SQL

Is there a way to there a way to delimit data and select the 2nd to last substring
Sample Input:
*------------------------------------*
| Name |
*------------------------------------*
|Mike__NYC_180x9_School |
|Oak_Ann_1_LA_1x190_Uni |
|Tiger_King_Al_car_12_10x15_sample |
*------------------------------------*
Desired Output:
*--------------*
|Account number|
*--------------*
|180x9 |
|1x190 |
|10x15 |
*--------------*
reverse the string then take the second word and then reverse it again.
select reverse(split_part(reverse(Name),'_',2));

SQL padding 0 to the left of a number in string

I am a beginner in SQL language and I am using postgre sql and doing little exercices to learn. I have a column of strings named acronym from a destination table:
DO1
ES1
ES2
FR1
FR10
FR2
FR3
FR4
FR5
FR6
FR7
FR8
FR9
GP1
GP2
IN1
IN2
MU1
RU1
TR1
UA1
I would like to add a padding zero for acronym numbers that have only one digit, output:
DO01
ES01
ES02
FR01
FR02
FR03
FR04
FR05
FR06
FR07
FR08
FR09
FR10
GP01
GP02
IN01
IN02
MU01
RU01
TR01
UA01
How can I get to the left of the first number in the string? There is some regex I think but I did not figure it out
You can use the rpad() function to add characters to the end of the value:
select rpad(col, '0', 4)
In your case, though, you want a value in-between. On simple method is -- assuming that the first two characters are strings -- is:
(case when length(col) = 3
then left(col, 2) || '0' || right(col, 1)
else col
end)
Another possibility is using regexp_replace():
regexp_replace(col, '^([^0-9]{2})([0-9])$', '\10\2')
Both of these assume that the strings to be padded are three characters, which is consistent with your data. It is unclear what you want for other lengths.
try with below:
to_char() function
select to_char(column1, 'fm000') as column2
from Test_table;
fm "fill mode"prefix avoids leading spaces in the resulting var char.
000 it defines the number of digits you want to have.
You can use string functions like lpad(), substr(), left():
select
concat(left(columnname, 2), lpad(substr(columnname, 3), 2, '0')) result
from tablename
See the demo.
Results:
| result |
| ------ |
| DO01 |
| ES01 |
| ES02 |
| FR01 |
| FR10 |
| FR02 |
| FR03 |
| FR04 |
| FR05 |
| FR06 |
| FR07 |
| FR08 |
| FR09 |
| GP01 |
| GP02 |
| IN01 |
| IN02 |
| MU01 |
| RU01 |
| TR01 |
| UA01 |