Extract the last element of a list for a split string - hive

I'm trying to take a regular expression and split it by a pre-determined character, and then extract the final value of the returned list.
For example, my string may take the form:
name
WAYNE.ROONEY.226
ROSS.BARKLEY.HELLO.113
ADAM.A122
Pythonically, what I'm trying to do is:
for x in list:
my_val = x.split('.')[-1] #Return the last element of the list when split on .
e.g. desired output:
name value
WAYNE.ROONEY.226 226
ROSS.BARKLEY.HELLO.113 113
ADAM.A122 A122
Can anyone provide me any pointers in either Hive or Impala please?
If I can create this as a view, ideally, that would be perfect, but am also happy with generating actual output with it and then re-uploading to a table
Thank you!

For Hive:
select regexp_extract(NAME, '\\.([^\\.]+)$', 1) as VALUE
from WHATEVER
And pleeeease [edit] learn the power of regular expressions...

Related

How to get value string with regexp in bigquery

Hi i have string in BigQuery column like this
cancellation_amount: 602000
after_cancellation_transaction_amount: 144500
refund_time: '2022-07-31T06:05:55.215203Z'
cancellation_amount: 144500
after_cancellation_transaction_amount: 0
refund_time: '2022-08-01T01:22:45.94919Z'
i already using this logic to get cancellation_amount
regexp_extract(file,r'.*cancellation_amount:\s*([^\n\r]*)')
but the output only amount 602000, i need the output 602000 and 144500 become different column
Appreciate for helping
If your lines in the input (which will eventually become columns) are fixed you can use multiple regexp_extracts to get all the values.
SELECT
regexp_extract(file,r'cancellation_amount:\s*([^\n\r]*)') as cancellation_amount
regexp_extract(file,r'. after_cancellation_transaction_amount:\s*([^\n\r]*)') as after_cancellation_transaction_amount
FROM table_name
One issue I found with your regex expression is that .*cancellation_amount won't match after_cancellation_transaction_amount.
There is also a function called regexp_extract_all which returns all the matches as an array which you can later explode into columns, but if you have finite values separating them out in different columns would be a easier.

How to retrieve the required string in SQL having a variable length parameter

Here is my problem statement:
I have single column table having the data like as :
ROW-1>> 7302-2210177000-XXXX-XXXXXX-XXX-XXXXXXXXXX-XXXXXX-XXXXXX-U-XXXXXXXXX-XXXXXX
ROW-2>> 0311-1130101-XXXX-000000-XXX-XXXXXXXXXX-XXXXXX-XXXXXX-X-XXXXXXXXX-WIPXXX
Here i want to separate these values from '-' and load into a new table. There are 11 segments in this string separated by '-', therefore, 11 columns. The problem is:
A. The length of these values are changing, however, i have to keep it as the length of these values in the standard format or the length which it has
e.g 7302- (should have four values, if the value less then that then keep that value eg. 73 then it should populate 73.
Therefore, i have to separate as well as mentation the integrity. The code which i am writing is :
select
SUBSTR(PROFILE_ID,1,(case when length(instr(PROFILE_ID,'-')<>4) THEN (instr(PROFILE_ID,'-') else SUBSTR(PROFILE_ID,1,4) end)
)AS [RQUIRED_COLUMN_NAME]
from [TABLE_NAME];
getting right parenthesis error
Please help.
I used the regex_substr SQL function to solve the above issue. Here below is an example:
select regex_substr('7302-2210177000-XXXX-XXXXXX-XXX-XXXXXXXXXX-XXXXXX-XXXXXX-U-XXXXXXXXX-XXXXXX ROW-2>> 0311-1130101-XXXX-000000-XXX-XXXXXXXXXX-XXXXXX-XXXXXX-X-XXXXXXXXX-WIPXXX',[^-]+,1,1);
Output is: 7302 --which is the 1st segment of the string
Similarly, the send string segment which is separated by "-" in the string can be obtained by just replacing the 1 with 2 in the above query at the end.
Example : select regex_substr('7302-2210177000-XXXX-XXXXXX-XXX-XXXXXXXXXX-XXXXXX-XXXXXX-U-XXXXXXXXX-XXXXXX ROW-2>> 0311-1130101-XXXX-000000-XXX-XXXXXXXXXX-XXXXXX-XXXXXX-X-XXXXXXXXX-WIPXXX',[^-]+,1,2);
output: 2210177000 which is the 2nd segment of the string

Creating a column containing only a specific value of another column

I have table such as
column
abcx sample 6.5oz
bbcd sku 2ct
tty 80z
rre pool 65g box
How can I create a new column in my select statement that would just give me what the size of each row value is? ( 6.50z, 2ct, 80z, 65g)
desired output:
size
6.5oz
2ct
90z
65g
The question is not specified precisely.
Here's a solution assuming you're interested in "a sequence of digits, possibly containing a dot, immediately followed by a sequence of lowercase letters".
If so, this should do it:
select col, regexp_substr(col, '[\\d\\.]+[a-z]+') from test;
If this is not what you're looking for, please make the question very specific

How do I write a SQL Query to fetch data according to a specific format/ pattern of String?

I need to write a SQL query to fetch data according to a specific format of String. I need to fetch those records where the LOC column of my query looks like the following:
Cx-xxx-Lx
Or
Cxx-xxx-Lx
Or
x-xxx-Lx
Or
xx-xxx-Lx
Or
xxxxxLx_x
Or
xxxxxLx_xx
Or
BxxxLLxxxx
Where :-
x is a number (0 to 9)
L is a letter (A to Z)
I have filtered the LOC column to fetch data where the length of the record is either 9 or 10. Although this is fetching correct data from the DB, this is not a correct way of doing so.
My current SQL:
select * from table
where length(LOC) in (9,10)
Any help would be appreciated.
You can use regular expressions. Something like:
where regexp_like(LOC, '^C?[0-9]{1,2}-[0-9]{3}-[A-Z][0-9]$') or
regexp_like(LOC, '^[0-9]{5}[A-Z][0-9]_[0-9]{1,2}$') or
regexp_like(LOC, '^B[0-9]{3}[A-Z]{2}[0-9]{4}$')
You can combine these into one regular expression using |. I think it is easier to follow and debug as three separate expressions.

Construct table name dynamically in pentaho

I am making a pentaho transformation and i have a table input step. My requirement was to pass the table input step dynamically as a variable and this i achieved by doing :
select * from ${table_name}
and when i run the transformation, i pass in the value of
table_name.
This works.
But my new requirement is: To pass the date as a variable and then construct a table name from the date based on the month and year
So for example:If I pass in 2012-01-31, I want a sql like this:
select * from xxx_201201_v
I cannot use a substr like this:
select * from xxx_substr(${input_date} ,0,4)
So I am confused how to do this
You could use a Get Variables step so you have your date in a field. Then use steps to manipulate your strings and then pass the field to the Table Input step like this
select * from ?
Of course you should activate the option of the Table Input step to get fields from a previous step. If you have 2012-01-31 you can get 201201 using a find and replace step where you find "-" and replace it with nothing, and then cut the string so that you have the first 6 digits.