Hive query to extract part of the field after the matching pattern

Hive query to extract part of the field after the matching pattern - hive

Need Hive Query using regexp_extract to extract a part of a Field (type String). The value in it is colon separated
Field1 ( String)
----------------
AAA:123,BBB:345,CCC:456,DDD:789,EEE:434
AAA:343,BBB:222,DDD:989,EEE:344
BBB:233,CCC:211,DDD:888,EEE:912
Need to extract the value of BBB
Field1
-------
345
222
233
Tried regexp_extract and could not get the output as desired.

Assume your table name is temp, and column name is s with one string. You can use this function to get your values: select regexp_extract(s, 'BBB:(.*?)(,)', 1) from temp;

Use this regex:
select regexp_extract('AAA:123,BBB:345,CCC:456,DDD:789,EEE:434', '(BBB:)([\\d]+)', 2);
345
select regexp_extract('AAA:343,BBB:222,DDD:989,EEE:344', '(BBB:)([\\d]+)', 2);
222
select regexp_extract('BBB:233,CCC:211,DDD:888,EEE:912', '(BBB:)([\\d]+)', 2);
233

Related

AWS Athena: How can we get integer value as string with thousand comma separator in AWS Athena

How can we show integer numbers with thousand comma separator.
So, by executing the below statement
select * from 1234567890
How can we get the result as 1,234,567,890

You can achieve this by casting number to string and using regex:
with dataset(num) as (
values (1234567890),
(123456789),
(12345678),
(1234567),
(123456),
(12345),
(1234),
(123)
)
select regexp_replace(cast(num as VARCHAR), '(\d)(?=(\d\d\d)+(?!\d))', '$1,')
from dataset
Output:
_col0
1,234,567,890
123,456,789
12,345,678
1,234,567
123,456
12,345
1,234
123

Hive regexp_extract numeric value from a string

I have a table as:
column1
A.A=123; B.B=124; C.C=125
C.C=127
I am trying to get the numeric values from the table. The expected output is
A -> 123 / B -> 124 etc
I am trying to do using regexp_extract
Any suggestions please?

If the delimiters are fixed - '; ' between key-value pairs and '=' between key and value, you can use str_to_map function:
select str_to_map('A.A=123; B.B=124; C.C=125','; ','=')['A.A'] as A --returns 123
If you prefer regexp:
select
regexp_extract('A.A=123; B.B=124; C.C=125','A.A=(\\d*)',1) as A, --returns 123
regexp_extract('A.A=123; B.B=124; C.C=125','B.B=(\\d*)',1) as B --returns 124
and so on
for case insensitive add (?i) to the regexp
select regexp_extract('A.A=123; b.b=124; C.C=125','(?i)B.B=(\\d*)',1) as B --returns 124

How to choose output length based on whether or not first character is a letter?

I am trying to return a column in sql
which should return 4 characters from the column when the character starts with an alphabet, for a numeric it should return only 3 characters .
Eg:
column:
B98497
C68756
r45789
123467
578912
output:
the above column should return the following
column:
B984
C687
r457
123
578
I used the following code but it returns only first three characters
my code:
select substring(column,1,3)
from table
the output for my code:
column
B98
C68
r45
123
578
how do I get an output like this:
B984
C687
r457
123
578

One method is:
select left(col, 3 + (col rlike '^[a-zA-Z'))
This uses the fact that boolean expressions evaluate to 1 or 0 in a number context.

If you already use MySQL 8.0, there's a one-liner:
SELECT REGEXP_SUBSTR(column1, '[a-z]?[0-9]{3}') FROM Table1;
Demo. And here's the corresponding docpage.
If it's 5.7, you have to check the first symbol against letter character class, like this:
SELECT LEFT(column1, IF(column1 RLIKE '^[a-z]', 4, 3)) FROM Table1;
... which can actually be simplified as showed in #GordonLinoff answer.
Demo.

Following Should Do
select
CASE WHEN substring(TheColumn, 1,1) LIKE '[0-9]' THEN substring(TheColumn, 1,3)
ELSE substring(TheColumn, 1, 4) END
from [dbo].[TabAlpNum]

Capturing particular part of Integer Value from part of a String value

I have a table like cust_attbr consists column attbr which has values like:
{"SRCTAXAMT":"11300",เอ็ก10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}
{"SRCTAXAMT":"11300", กรุงค10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}
........ ... ...
{"SRCTAXAMT":"11300", กรุงค10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":" "}
So, I have to write one select statement which will fetch only VAT_NUMBER value like:
0835546003122
0835546003122
.... ... ..
null

With sample data you posted:
SQL> select * From test;
ID ATTBR
---------- ----------------------------------------------------------------------------------------------------------------
1 "{"SRCTAXAMT":"11300",????10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}"
2 "{"SRCTAXAMT":"11300", ?????10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}"
3 "{"SRCTAXAMT":"11300", ?????10110","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":" "}"
this might be one option:
SQL> select id,
2 regexp_substr(regexp_substr(attbr, 'VAT_NUMBER":"(\d+)?'), '\d+$') vat
3 from test;
ID VAT
---------- --------------------
1 0835546003122
2 0835546003122
3
SQL>
Inner regexp_substr returns VAT_NUMBER followed by optional number, while the outer one extracts only the number anchored to the end of the previous substring.

If you're on 18c and the data is actual json (it currently is not because of the double quotes around the curly braces and the ",.กรุงค10110" - It is unclear that this is because of your sample data) you could use json_table function:
WITH t (json_val) AS
(
SELECT '{"SRCTAXAMT":"11300","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}' FROM DUAL UNION ALL
SELECT '{"SRCTAXAMT":"11300","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":"0835546003122"}' FROM DUAL UNION ALL
SELECT '{"SRCTAXAMT":"11300","TAXAMT":"11300","LOGID":"190301863","VAT_NUMBER":" "}' FROM DUAL
)
SELECT jt.*
FROM t,
JSON_TABLE(json_val, '$'
COLUMNS (first_name VARCHAR2(50 CHAR) PATH '$."VAT_NUMBER"')) jt;
0835546003122
0835546003122

One option would be converting those column values to JSON syntax an then extract the values of VAT_NUMBER keys provided DB version is 12c Release 1+. Here, we have an issue that there are unrecognized characters, probably an alphabet from far east and those strings are not properly quoted, then we need to remove the part upto TAXAMT key, and then extracting VAT_NUMBER key's value through prefixing an opening curly brace('{') by use of JSON_VALUE() function :
SELECT JSON_VALUE(
'{'||REGEXP_REPLACE(str,'(.*10110",)(.*)','\2'),
'$.VAT_NUMBER'
) AS VAT_NUMBER
FROM tab --> your original data source
Demo

Get the first value from a list of values a table index

My input is a two column table with headers _id and change_num. change_num is a string of comma-separated numbers that correspond to change IDs. For example:
_id change_num
123 4354, 3243, 7893
456 920, 1232, 9834, 2323
I want to get the first value in each row of change_num, so my output looks like this:
_id change_num
123 4354
456 920
How can I stop at the first comma and neglect everything thereafter? Furthermore, if a change_num starts with CN, can I ignore it and just get the number?
_id change_num
123 CN4354, 3243, 7893
456 920, 1232, 9834, 2323
to return
_id change_num
123 4354
456 920

This is string manipulation. Something like this should work:
select t.id,
replace(substr(change_num, 1, instr(change_num, ',') - 1), 'CN', '')
from table t;
As you can tell, storing ids in a comma-separated list is a bad idea. If you have any control over the data structure, you should add a junction table.

Try this:
SELECT _id,SUBSTRING_INDEX(change_num, ',', 1);

You can use the REGEXP_SUBSTR function in oracle 10 or later (or an appropriate function in other db) and use regex to query the data that you want:
select _id, REGEXP_SUBSTR(change_num, '\d+') from tablet;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive query to extract part of the field after the matching pattern - hive

Assume your table name is temp, and column name is s with one string. You can use this function to get your values: select regexp_extract(s, 'BBB:(.*?)(,)', 1) from temp;

Use this regex: select regexp_extract('AAA:123,BBB:345,CCC:456,DDD:789,EEE:434', '(BBB:)([\\d]+)', 2); 345 select regexp_extract('AAA:343,BBB:222,DDD:989,EEE:344', '(BBB:)([\\d]+)', 2); 222 select regexp_extract('BBB:233,CCC:211,DDD:888,EEE:912', '(BBB:)([\\d]+)', 2); 233

Related

AWS Athena: How can we get integer value as string with thousand comma separator in AWS Athena

Hive regexp_extract numeric value from a string

How to choose output length based on whether or not first character is a letter?

Capturing particular part of Integer Value from part of a String value

Get the first value from a list of values a table index

Categories

Resources