create new columns from xml value in hive - sql

I have a column desc_txt in my table and its contents are quite similar to that of xml like shown below-
desc_txt
-----------
<td><strong>Criticality</strong></td><td>High</td></tr><td><strong>Country</strong></td><td>India</td></tr><tr><td><strong>City</strong></td><td>Indore</td>
Requirement is to have a new table/view created from this table having additional columns like Criticality, Country, City along with the column values like High, India, Indore, respectively.
How can this be achieved in Hive/Impala?

This can be done in two steps. I assumed you have only four columns to pull.
Load the data as is in a table. Put everything in a column.
Then use this below SQL to split the data multiple columns. I assumed 4 columns, you can increase as per your requirement.
with t as (
SELECT rtrim(ltrim(
regexp_replace( replace( trim(
regexp_replace(
regexp_replace("<td><strong>Criticality</strong></td><td>High</td></tr><td><strong>Country</strong></td><td>India</td></tr><tr><td><strong>City</strong></td><td>Indore</td>","</?[^>]*>",",")
,',,',',') ), ' ,', ',' ), '(,){2,}', ','),','),',')
str)
select split_part(str, ',', 1) as first_col,
split_part(str, ',', 2) as second_col,
split_part(str, ',', 3) as third_col,
split_part(str, ',', 4) as fourth_col
from t
The query is tricky - first it replaces all tags with comma in them, then it replaces multiple commas with single comma, then it removes comma from start and end of the string. split function then splits whole string based on comma and create individual columns.
HTH...

Related

Remove items in a delimited list that are non numeric in SQL for Redshift

I am working with a field called codes that is a delimited list of values, separated by commas. Within each item there is a title ending in a colon and then a code number following the colon. I want a list of only the code numbers after each colon.
Example Value:
name-form-na-stage0:3278648990379886572,rules-na-unwanted-sdfle2:6886328308933282817,us-disdg-order-stage1:1273671130817907765
Desired Output:
3278648990379886572,6886328308933282817,1273671130817907765
The title does always start with a letter and the end with a colon so I can see how REGEXP_REPLACE might work to replace any string between starting with a letter and ending with a colon with '' might work but I am not good at REGEXP_REPLACE patterns. Chat GPT is down fml.
Side note, if anyone knows of a good guide for understanding pattern notation for regular expressions it would be much appreciated!
I tried this and it is not working REGEXP_REPLACE(REPLACE(REPLACE(codes,':', ' '), ',', ' ') ,' [^0-9]+ ', ' ')
This solution assumes a few things:
No colons anywhere else except immediately before the numbers
No number at the very start
At a high level, this query finds how many colons there are, splits the entire string into that many parts, and then only keeps the number up to the comma immediately after the number, and then aggregates the numbers into a comma-delimited list.
Assuming a table like this:
create temp table tbl_string (id int, strval varchar(1000));
insert into tbl_string
values
(1, 'name-form-na-stage0:3278648990379886572,rules-na-unwanted-sdfle2:6886328308933282817,us-disdg-order-stage1:1273671130817907765');
with recursive cte_num_of_delims AS (
select max(regexp_count(strval, ':')) AS num_of_delims
from tbl_string
), cte_nums(nums) AS (
select 1 as nums
union all
select nums + 1
from cte_nums
where nums <= (select num_of_delims from cte_num_of_delims)
), cte_strings_nums_combined as (
select id,
strval,
nums as index
from cte_nums
cross join tbl_string
), prefinal as (
select *,
split_part(strval, ':', index) as parsed_vals
from cte_strings_nums_combined
where parsed_vals != ''
and index != 1
), final as (
select *,
case
when charindex(',', parsed_vals) = 0
then parsed_vals
else left(parsed_vals, charindex(',', parsed_vals) - 1)
end as final_vals
from prefinal
)
select listagg(final_vals, ',')
from final

How to ignore first value and first comma in comma separated string in SQL Server

I am having a simple comma separated list of values.
All I want to do is to ignore first value and first comma from the list
'Hello1, Hello2, Hello3, Hello4'
I want result as 'Hello2, Hello3, Hello4'
I want to ignore 'Hello1' and First ','
You can use stuff():
select stuff(str, 1, charindex(',', str + ',') + 1, '')
Storing lists of things in strings usually indicates that something is wrong with the database design. You should be storing these values in separate rows of a table.
You can do :
select ltrim(stuff(col, 1, charindex(',', col), ''))

Using string methods in a SELECT query to select up to the second space?

In an MS-Access database I'm working with, one of the tables has a field called "Name". The format of this field will generally be along the lines of "firstname surname integer", but sometimes may just be "firstname surname".
I need to select just the first name and the surname from the name field.
I've looked at using the Left function
SELECT DISTINCT LEFT([Name], x)
However since names are different lengths, this isn't going to work since there is no constant integer to use as the second parameter. Nor can it be used with
SELECT DISTINCT LEFT(InStr([Name], " "), x)
for the above reason, but also because because that would split the field at the first space.
Is there a way using LEFT, TRIM, SPLIT or any other string manipulation that I can create a query to select just the first two parts of the name? I need the space included.
You can try this.
SELECT DISTINCT IIf( ( InStr( InStr([Name],' ') + 1 , [Name], ' ') > 0 ), Left( [Name], InStr(InStr([Name],' ') + 1 , [Name], ' ') ), [Name])
FROM MyTable;

How to remove specific value from comma separated string in oracle

I want remove specific value from comma separated sting using oracle.
Sample Input -
col
1,2,3,4,5
Suppose i want to remove 3 from the string.
Sample Output -
col
1,2,4,5
Please suggest how i can do this using oracle query.
Thanks.
Here is a solution that uses only standard string functions (rather than regular expressions) - which should result in faster execution in most cases; it removes 3 only when it is the first character followed by comma, the last character preceded by comma, or preceded and followed by comma, and it removes the comma that precedes it in the middle case and it removes the comma that follows it in the first and third case.
It is able to remove two 3's in a row (which some of the other solutions offered are not able to do) while leaving in place consecutive commas (which presumably stand in for NULL) and do not disturb numbers like 38 or 123.
The strategy is to first double up every comma (replace , with ,,) and append and prepend a comma (to the beginning and the end of the string). Then remove every occurrence of ,3,. From what is left, replace every ,, back with a single , and finally remove the leading and trailing ,.
with
test_data ( str ) as (
select '1,2,3,4,5' from dual union all
select '1,2,3,3,4,4,5' from dual union all
select '12,34,5' from dual union all
select '1,,,3,3,3,4' from dual
)
select str,
trim(both ',' from
replace( replace(',' || replace(str, ',', ',,') || ',', ',3,'), ',,', ',')
) as new_str
from test_data
;
STR NEW_STR
------------- ----------
1,2,3,4,5 1,2,4,5
1,2,3,3,4,4,5 1,2,4,4,5
12,34,5 12,34,5
1,,,3,3,3,4 1,,,4
4 rows selected.
Note As pointed out by MT0 (see Comments below), this will trim too much if the original string begins or ends with commas. To cover that case, instead of wrapping everything within trim(both ',' from ...) I should wrap the rest within a subquery, and use something like substr(new_str, 2, length(new_str) - 2) in the outer query.
Here is one method:
select trim(both ',' from replace(',' || '1,2,3,4,5' || ',', ',' || '3' || ',', ','))
That said, storing comma-delimited strings is a really, really bad idea. There is almost no reason to do such a thing. Oracle supports JSON, XML, and nested tables -- all of which are better alternatives.
The need to remove an element suggests a poor data design.
You can convert the list rows using an XMLTABLE, filter to remove the unwanted rows and then re-aggregate them:
SELECT LISTAGG( x.value.getStringVal(), ',' ) WITHIN GROUP ( ORDER BY idx )
FROM XMLTABLE(
( '1,2,3,4,5' )
COLUMNS value XMLTYPE PATH '.',
idx FOR ORDINALITY
) x
WHERE x.value.getStringVal() != 3;
For a simple filter this is probably not worth it and you should use something like (based on #mathguy's solution):
SELECT SUBSTR( new_list, 2, LENGTH( new_list ) - 2 ) AS new_list
FROM (
SELECT REPLACE(
REPLACE(
',' || REPLACE( :list, ',', ',,' ) || ',',
',' || :value_to_replace || ','
),
',,',
','
) AS new_list
FROM DUAL
)
However, if the filtering is more complicated then it might be worth converting the list to rows, filtering and re-aggregating.
I do not knwo how to do this in Oracle, but with SQL-Server I'd use a trick:
convert the list to XML by replacing the comma with tags
use XQuery to filter the data
reconcatenate
This is SQL Server syntax but might point you the direction:
declare #s varchar(100)='1,2,2,3,3,4';
declare #exclude int=3;
WITH Casted AS
(
SELECT CAST('<x>' + REPLACE(#s,',','</x><x>') + '</x>' AS XML) AS TheXml
)
SELECT x.value('.','int')
FROM Casted
CROSS APPLY TheXml.nodes('/x[text()!=sql:variable("#exclude")]') AS A(x)
UPDATE
I just found this answer which seems to show pretty well how to start...
I agree with Gordon regarding the fact that storing comma delimited data in a column is a really bad idea.
I just preceed the csv with a ',', then use the replace function followed by a left trim function to clean-up the preceeding ','.
SCOTT#tst>VAR b_number varchar2(5);
SCOTT#tst>EXEC :b_number:= '3';
PL/SQL procedure successfully completed.
SCOTT#tst>WITH srce AS (
2 SELECT
3 ',' || '3,1,2,3,3,4,5,3' col
4 FROM
5 dual
6 ) SELECT
7 ltrim(replace(col,',' ||:b_number),',') col
8 FROM
9 srce;
COL
1,2,4,5

find comma in oracle sql string

I have a column in my oracle db as character and data stored in this is like here
30.170527093355002,72.615875338654 and
30.805165,71.82474
Now I want to get the separated by comma whole string. I mean I want to get the part of string before comma and also part after comma separately. Please any one tell me is there any built in function to do this that I can separate my while string by comma regardless of comma position where it exist.I have already tried floor function and substr but all in vain please help me to use any built in function or user defined function to full fill my requirements.
select
substr( COLNAME, 1, instr( COLNAME, ',') - 1 ) as p_1 ,
substr( COLNAME, instr( COLNAME, ',', - 1 ) + 1 ) as p_2
from YOURTABLE