How to extract string between characters in Hive - hive

I have a Hive table with a column which includes a string with multiple topic names. I am looking to split out the first topic name (and if possible the second and third). The string can contain up to 8 topic names.
The format of the string is:
["T.Topic1", "T.Topic2", "T.Topic3", "S.Topic4", "S.Topic5"]
I have tried the following but wanted to know if there was a better way that would not involve the need to remove the left characters " and the right character " in a subsequent line or a possibility to extract more than the first topic.
SELECT SUBSTR(split(l.Intent, '[\\,]')[0], 2) AS TOPIC_1
FROM Table l
Results:
"T.Topic1"
Thank you

You are really close to the solution.
I suggest that you try and tackle it in 2 stages.
Remove the Array
Split the string.
regexp_extract(l.Intent,'^\\["(.*)"\\]' ) This will get the text inside the array.
split ( text , '", "' ) will split the string into the array you want.
putting it together:
with l as (select '["T.Topic1", "T.Topic2", "T.Topic3", "S.Topic4", "S.Topic5"]' as Intent)
select
split (
regexp_extract(l.Intent,'^\\["(.*)"\\]')
, '", "' ) as array_of_topics
from l as topics;
You can now access these rows topics.array_of_topics[0],topics.array_of_topics[1],topics.array_of_topics[3]

Related

Parse string into elements, where each element will get column with unique name, Bigquery sql

I'm not getting anywhere with chatgpt :)
Big query sql syntax.
Let's say I have a string of IPs separated by commas. Strings can have different lengths.
These are the strings:
First example:
'1.1.1.1, 12.12.12.12'
Second example:
'1.1.1.1, 12.12.12.12, 3.3.3.3'
Using the comma, I want to parse the string.
As a result, I would like each element to have a column name: ip_ + its position in the original string.
First example:
ip_1, ip_2,
1.1.1.1, 12.12.12.12'
Second example:
ip_1, ip_2, ip_3
1.1.1.1, 12.12.12.12, 3.3.3.3
Could you please assist me with this query?
Thanks!
Consider below approach
select * from (
select list, offset, ip
from your_table, unnest(split(list)) ip with offset
)
pivot (any_value(trim(ip)) as ip for offset + 1 in (1,2,3))
if applied to sample data in your question - output is
You can refactor above into dynamic pivot - there are plenty posts here on SO showing the technique

How to place a comma in string based on specific character position in big query?

I want to place comma after every 4 characters of a string.
Ex : [00030004002900310057010801380139022403680374]
Need result as : [0003,0004,0029,0031,0057,0108,0138,0139,0224,0368,0374]
Someone please help me.
Consider below approach
select input_col,
(select string_agg(val, ',' order by offset)
from unnest(regexp_extract_all(input_col, r'.{4}')) val with offset
) output_col
from data
if applied to sample in your question - output is

How to split numbers and string into two columns using TSQL

In my project, I have a column that contains numbers (including "-" symbol) and a string. I want to split it into two columns. The separator between numbers and string differs it can be " " or " - ". Is it possible to solve this issue by means of a TSQL query?
This TSQL engine is placed in Devexpress WinForms designer.
Example:
Col:
343234-2321 String string
402-09-12 - Another string
Just string
303-404 - Text field
Expected result
Col1
Col2
343234-2321
String string
402-09-12
Another string
NULL
Just string
303-404
Text field
Thank you in advance!
Assuming you always need to break your string in half after the end of the numeric digits as your sample data demonstrates, a possible solution is to use patindex:
with s as (
select col, PatIndex('%[A-z]%',col) d
from t
)
select col,
NullIf(Trim(case when Substring(Left(col,d-1),d-2,1)='-' then
Left(col,d-3)
else Left(col,d-1) end),'') Col1,
NullIf(Trim(NullIf(Substring(col,d,Len(col)),'')),'') Col2
from s
See Example DB Fiddle
Note, if you are using SQL2016 or prior you'll need to replace trim with nested ltrim & rtrim

how to read & seprated data in single column

Field Description:
User_id Unique identifier of every user following these creators
Creator_id List of creator ids separated by ‘&’
User_id,Creator_IDs
U100,A300&A301&A302
U101,A301&A302
U102,A302
U103,A303&A301&A302
U104,A304&A301
U105,A305&A301&A302
U106,A301&A302
U107,A302
Note: I have to remove U and A before the values, I though I could use substring for U but what can I do for A since it is varying.
Moreover going forward I have to use this data to have distinct creator_id and subsequent user following them.
You could try using regexp_replace eg:
select regexp_replace(User_id, "^U", "")
, regexp_replace(regexp_replace(Creator_IDs, "A", ""), '&', ',')
You can use REPLACE function to remove A from the string. The function would be something like this -
SELECT REPLACE('A300&A301&A302', 'A','') AS NewString;
For the entire query -
select concat (REPLACE('U100', 'U',''),',',REPLACE('A300&A301&A302', 'A',''));
You can use this to see how it works. For your query of course you have to use the column names -
select concat (REPLACE(user_id, 'U',''),',',REPLACE(Creator_Id, 'A',''));

How to select specific data between Quotes (")

I am reposting my question as I am new to SQL 2012.
I want to fetch the numeric data between quotes (") in the following rows,
row1:'asdalknd,"1,2,3,4",slknsdl,"5,6,7,8",snlsn'
row2:'asknd,"111,267,387,4756",snsdl,"534,646,767,348",snlssdsdsdsjkvkn'
row3'....
row4'....
row5'....
row6'...
row7'...
row8'....
The above mentioned are the rows of a single column.
I just want to extract the numerics(may be in another column for each rows)
Can anybody pls help, as this is way above my basic knowledge of t-sql.
Thanks
this is Ugly, but will eventually work:
COLUMN = 'jksjdksls#$#$##kskjfjf,"123,456,789" lsnslkdswfnslsjfls'
left(
right(COLUMN,len(COLUMN)-instr(COLUMN,"""")),
instr(
right(COLUMN,len(COLUMN)-instr(COLUMN,"""")),
"""") -1
)
--> 123,456,789
This is what is done:
We take this string 'jksjdksls#$#$##kskjfjf,"123,456,789" lsnslkdswfnslsjfls'
find the first occurence of " with instr(COLUMN,"""") --> returns 24
take the right end of the string with. Therefore we need to take the length of the string with len(COLUMN)--> 55 and substract the position of the first " (24)
then we need to find the second " with instr()in the right string, which we need to create again with right(COLUMN,len(COLUMN)-instr(COLUMN,"""")) and substract 1 for the ".