Hive: how to eliminate the duplicated substrings

Hive: how to eliminate the duplicated substrings - hive

Hive table:
create table T (i int, s string);
insert into T values
(1, "a1&b2"),
(1, "b2&c3"),
(2, "c1&d2"),
(2, "c1");
The s column contains the values separated by &
Desired output should group by 1st column and concatenate the s column but have only one unique substring values (separated by &):
i grouped_s
-- -------------
1 a1&b2&c3
2 c1&d2
Here is my attempt:
SELECT i,
concat_ws('&',
collect_set(
split(concat_ws('&' , collect_set(s)), "&" )
)
)
as grouped_s
FROM T group by i;
I got this:
FAILED: SemanticException [Error 10128]: Line 6:24 Not yet supported place for UDAF 'collect_set'
Also I would like to make it without using the nested SQL.

Use lateral view + explode:
select t.i, concat_ws('&',collect_set(e.val)) as grouped_s
from T t
lateral view outer explode(split(t.s,'&')) e as val
group by t.i;
Result:
t.i grouped_s
1 a1&b2&c3
2 c1&d2

Related

Possible to un-anonymize VALUES clause?

Let's say I have the following as a starting point:
select * from (values (1,'a'),(2,'b'))
Is it possible to provide column names to the value columns up-stream, for example something like:
select
col1 AS id,
col2 AS letter
from (
<anonymous values>
)
Or is it basically once you have an anonymous values clause you cannot name it.

You can use a table alias that also specifies column names:
select *
from (
values (1,'a'),(2,'b')
) as v(id, letter);

use an alias:
select * from (values (1,'a'),(2,'b')) as foo(id,txt);
id | txt
----+-----
1 | a
2 | b
(2 rows)

Absolutely
Just add the column names to the query alias
select q.id, q.col
from (values
(1,'a'),
(2,'b')
) q(id, col)

Compare a single-column row-set with another single-column row set in Oracle SQL

Is there any Oracle SQL operator or function, which compares 2 result sets whether they are the exact same or not. Currently my idea is to use MINUS operator in both directions, but I am looking for a better and performanter solution to achieve. The one result set is fixed (see below), the other depends on the records.
Very important: I am not allowed to change the schema and structure. So CREATE TABLE and CREATE TYPE etc. are not allowed here for me. Also important that oracle11g version is used where the solution must be found.
The shema for SQL Fiddle is:
CREATE TABLE DETAILS (ID INT, MAIN_ID INT, VALUE INT);
INSERT INTO DETAILS VALUES (1,1,1);
INSERT INTO DETAILS VALUES (2,1,2);
INSERT INTO DETAILS VALUES (3,1,3);
INSERT INTO DETAILS VALUES (4,1,4);
INSERT INTO DETAILS VALUES (5,2,1);
INSERT INTO DETAILS VALUES (6,2,2);
INSERT INTO DETAILS VALUES (7,3,1);
INSERT INTO DETAILS VALUES (7,3,2);
Now this is my SQL query for doing the job well (selects MAIN_IDs of those, whose 'VALUE's are exactly the same as the given lists'):
SELECT DISTINCT D.MAIN_ID FROM DETAILS D WHERE NOT EXISTS
(SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID
MINUS
SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2)))
AND NOT EXISTS
(SELECT * FROM TABLE(SYS.ODCINUMBERLIST(1, 2))
MINUS
SELECT VALUE FROM DETAILS WHERE MAIN_ID=D.MAIN_ID)
The SQL Fiddle link: http://sqlfiddle.com/#!4/25dde/7/0

If you use a collection (rather than a VARRAY) then you can aggregate the values into a collection and directly compare two collections:
CREATE TYPE int_list AS TABLE OF INT;
Then:
SELECT main_id
FROM details
GROUP BY main_id
HAVING CAST( COLLECT( value ) AS int_list ) = int_list( 1, 2 );
Outputs:
| MAIN_ID |
| ------: |
| 2 |
| 3 |
db<>fiddle here
Update
Based on your expanded fiddle in comments, you can use:
SELECT B.ID
FROM BUSINESS_DATA B
INNER JOIN BUSINESS_NAME N
ON ( B.NAME_ID=N.ID )
WHERE N.NAME='B1'
AND EXISTS (
SELECT business_id
FROM ORDERS O
LEFT OUTER JOIN TABLE(
SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' )
) d
ON ( o.orderdate = d.COLUMN_VALUE )
WHERE O.BUSINESS_ID=B.ID
GROUP BY business_id
HAVING COUNT( CASE WHEN d.COLUMN_VALUE IS NULL THEN 1 END ) = 0
AND COUNT( DISTINCT o.orderdate )
= ( SELECT COUNT(DISTINCT COLUMN_VALUE) FROM TABLE( SYS.ODCIDATELIST( DATE '2021-01-03', DATE '2020-04-07', DATE '2020-05-07' ) ) )
)
(Note: Do not implicitly create dates from strings; it will cause the query to fail, without there being any changes to the query text, if a user changes their NLS_DATE_FORMAT session parameter. Instead use TO_DATE with an appropriate format model or a DATE literal.)
db<>fiddle here

sql query db2 9.7 [duplicate]

Is there a built in function for comma separated column values in DB2 SQL?
Example: If there are columns with an ID and it has 3 rows with the same ID but have three different roles, the data should be concatenated with a comma.
ID | Role
------------
4555 | 2
4555 | 3
4555 | 4
The output should look like the following, per row:
4555 2,3,4

LISTAGG function is new function in DB2 LUW 9.7
see example:
create table myTable (id int, category int);
insert into myTable values (1, 1);
insert into myTable values (2, 2);
insert into myTable values (5, 1);
insert into myTable values (3, 1);
insert into myTable values (4, 2);
example: select without any order in grouped column
select category, LISTAGG(id, ', ') as ids from myTable group by category;
result:
CATEGORY IDS
--------- -----
1 1, 5, 3
2 2, 4
example: select with order by clause in grouped column
select
category,
LISTAGG(id, ', ') WITHIN GROUP(ORDER BY id ASC) as ids
from myTable
group by category;
result:
CATEGORY IDS
--------- -----
1 1, 3, 5
2 2, 4

I think with this smaller query, you can do what you want.
This is equivalent of MySQL's GROUP_CONCAT in DB2.
SELECT
NUM,
SUBSTR(xmlserialize(xmlagg(xmltext(CONCAT( ', ',ROLES))) as VARCHAR(1024)), 3) as ROLES
FROM mytable
GROUP BY NUM;
This will output something like:
NUM ROLES
---- -------------
1 111, 333, 555
2 222, 444
assumming your original result was something like that:
NUM ROLES
---- ---------
1 111
2 222
1 333
2 444
1 555

Depending of the DB2 version you have, you can use XML functions to achieve this.
Example table with some data
create table myTable (id int, category int);
insert into myTable values (1, 1);
insert into myTable values (2, 2);
insert into myTable values (3, 1);
insert into myTable values (4, 2);
insert into myTable values (5, 1);
Aggregate results using xml functions
select category,
xmlserialize(XMLAGG(XMLELEMENT(NAME "x", id) ) as varchar(1000)) as ids
from myTable
group by category;
results:
CATEGORY IDS
-------- ------------------------
1 <x>1</x><x>3</x><x>5</x>
2 <x>2</x><x>4</x>
Use replace to make the result look better
select category,
replace(
replace(
replace(
xmlserialize(XMLAGG(XMLELEMENT(NAME "x", id) ) as varchar(1000))
, '</x><x>', ',')
, '<x>', '')
, '</x>', '') as ids
from myTable
group by category;
Cleaned result
CATEGORY IDS
-------- -----
1 1,3,5
2 2,4
Just saw a better solution using XMLTEXT instead of XMLELEMENT here.

Since DB2 9.7.5 there is a function for that:
LISTAGG(colname, separator)
check this for more information: Using LISTAGG to Turn Rows of Data into a Comma Separated List

My problem was to transpose row fields(CLOB) to column(VARCHAR) with a CSV and use the transposed table for reporting. Because transposing on report layer slows down the report.
One way to go is to use recursive SQL. You can find many articles about that but its difficult and resource consuming if you want to join all your recursive transposed columns.
I created multiple global temp tables where I stored single transposed columns with one key identifier. Eventually, I had 6 temp tables for joining 6 columns but due to limited resource allocation I wasnt able to bring all columns together. I opted to below 3 formulas and then I just had to run 1 query which gave me output in 10 seconds.
I found various articles on using XML2CLOB functions and have found 3 different ways.
REPLACE(VARCHAR(XML2CLOB(XMLAGG(XMLELEMENT(NAME "A",ALIASNAME.ATTRIBUTENAME)))),'', ',') AS TRANSPOSED_OUTPUT
NVL(TRIM(',' FROM REPLACE(REPLACE(REPLACE(CAST(XML2CLOB(XMLAGG(XMLELEMENT(NAME "E", ALIASNAME.ATTRIBUTENAME))) AS VARCHAR(100)),'',' '),'',','), '', 'Nothing')), 'Nothing') as TRANSPOSED_OUTPUT
RTRIM(REPLACE(REPLACE(REPLACE(VARCHAR(XMLSERIALIZE(XMLAGG(XMLELEMENT(NAME "A",ALIASNAME.ATTRIBUTENAME) ORDER BY ALIASNAME.ATTRIBUTENAME) AS CLOB)), '',','),'',''),'','')) AS TRANSPOSED_OUTPUT
Make sure you are casting your "ATTRIBUTENAME" to varchar in a subquery and then calling it here.

other possibility, with recursive cte
with tablewithrank as (
select id, category, rownumber() over(partition by category order by id) as rangid , (select count(*) from myTable f2 where f1.category=f2.category) nbidbycategory
from myTable f1
),
cte (id, category, rangid, nbidbycategory, rangconcat) as (
select id, category, rangid, nbidbycategory, cast(id as varchar(500)) from tablewithrank where rangid=1
union all
select f2.id, f2.category, f2.rangid, f2.nbidbycategory, cast(f1.rangconcat as varchar(500)) || ',' || cast(f2.id as varchar(500)) from cte f1 inner join tablewithrank f2 on f1.rangid=f2.rangid -1 and f1.category=f2.category
)
select category, rangconcat as IDS from cte
where rangid=nbidbycategory

Try this:
SELECT GROUP_CONCAT( field1, field2, field3 ,field4 SEPARATOR ', ')

Concatenate comma delimited column values are remove duplicates

I need to concatenate comma delimited column values stored in an oracle table. I will also need to remove duplicated values when I concatenate the column values. I'm new to oracle and not sure where to start. Can someone help me achieve the following in oracle 11g?
Table:
rec_id affiliations
1 P,QE,D
2 EE,ED-D
1 QE,PO-D, D
2 A,EE
Desired output:
rec_id affiliations
1 P,QE,D,PO-D
2 EE,ED-D,A,EE

The first part of this query parses the input into a separate row for each affiliation; the final select concatenates them into a single list for each rec_id.
with parsed as (
select distinct
rec_id
,ltrim(regexp_substr(','||affiliations,',([^,])+',1,i), ',') k
from t, (select rownum i from dual connect by level <= 100)
where regexp_substr(','||affiliations,',([^,])+',1,i) is not null)
select distinct
rec_id
,listagg(k, ',') within group (order by k) over (partition by rec_id) affiliations
from parsed
order by rec_id;
Adjust the number (e.g. 100) to the maximum number of items you'd expect to see in the input.
http://sqlfiddle.com/#!4/37b44/4

DB2 Comma Separated Output by Groups

Is there a built in function for comma separated column values in DB2 SQL?
Example: If there are columns with an ID and it has 3 rows with the same ID but have three different roles, the data should be concatenated with a comma.
ID | Role
------------
4555 | 2
4555 | 3
4555 | 4
The output should look like the following, per row:
4555 2,3,4

LISTAGG function is new function in DB2 LUW 9.7
see example:
create table myTable (id int, category int);
insert into myTable values (1, 1);
insert into myTable values (2, 2);
insert into myTable values (5, 1);
insert into myTable values (3, 1);
insert into myTable values (4, 2);
example: select without any order in grouped column
select category, LISTAGG(id, ', ') as ids from myTable group by category;
result:
CATEGORY IDS
--------- -----
1 1, 5, 3
2 2, 4
example: select with order by clause in grouped column
select
category,
LISTAGG(id, ', ') WITHIN GROUP(ORDER BY id ASC) as ids
from myTable
group by category;
result:
CATEGORY IDS
--------- -----
1 1, 3, 5
2 2, 4

I think with this smaller query, you can do what you want.
This is equivalent of MySQL's GROUP_CONCAT in DB2.
SELECT
NUM,
SUBSTR(xmlserialize(xmlagg(xmltext(CONCAT( ', ',ROLES))) as VARCHAR(1024)), 3) as ROLES
FROM mytable
GROUP BY NUM;
This will output something like:
NUM ROLES
---- -------------
1 111, 333, 555
2 222, 444
assumming your original result was something like that:
NUM ROLES
---- ---------
1 111
2 222
1 333
2 444
1 555

Depending of the DB2 version you have, you can use XML functions to achieve this.
Example table with some data
create table myTable (id int, category int);
insert into myTable values (1, 1);
insert into myTable values (2, 2);
insert into myTable values (3, 1);
insert into myTable values (4, 2);
insert into myTable values (5, 1);
Aggregate results using xml functions
select category,
xmlserialize(XMLAGG(XMLELEMENT(NAME "x", id) ) as varchar(1000)) as ids
from myTable
group by category;
results:
CATEGORY IDS
-------- ------------------------
1 <x>1</x><x>3</x><x>5</x>
2 <x>2</x><x>4</x>
Use replace to make the result look better
select category,
replace(
replace(
replace(
xmlserialize(XMLAGG(XMLELEMENT(NAME "x", id) ) as varchar(1000))
, '</x><x>', ',')
, '<x>', '')
, '</x>', '') as ids
from myTable
group by category;
Cleaned result
CATEGORY IDS
-------- -----
1 1,3,5
2 2,4
Just saw a better solution using XMLTEXT instead of XMLELEMENT here.

Since DB2 9.7.5 there is a function for that:
LISTAGG(colname, separator)
check this for more information: Using LISTAGG to Turn Rows of Data into a Comma Separated List

My problem was to transpose row fields(CLOB) to column(VARCHAR) with a CSV and use the transposed table for reporting. Because transposing on report layer slows down the report.
One way to go is to use recursive SQL. You can find many articles about that but its difficult and resource consuming if you want to join all your recursive transposed columns.
I created multiple global temp tables where I stored single transposed columns with one key identifier. Eventually, I had 6 temp tables for joining 6 columns but due to limited resource allocation I wasnt able to bring all columns together. I opted to below 3 formulas and then I just had to run 1 query which gave me output in 10 seconds.
I found various articles on using XML2CLOB functions and have found 3 different ways.
REPLACE(VARCHAR(XML2CLOB(XMLAGG(XMLELEMENT(NAME "A",ALIASNAME.ATTRIBUTENAME)))),'', ',') AS TRANSPOSED_OUTPUT
NVL(TRIM(',' FROM REPLACE(REPLACE(REPLACE(CAST(XML2CLOB(XMLAGG(XMLELEMENT(NAME "E", ALIASNAME.ATTRIBUTENAME))) AS VARCHAR(100)),'',' '),'',','), '', 'Nothing')), 'Nothing') as TRANSPOSED_OUTPUT
RTRIM(REPLACE(REPLACE(REPLACE(VARCHAR(XMLSERIALIZE(XMLAGG(XMLELEMENT(NAME "A",ALIASNAME.ATTRIBUTENAME) ORDER BY ALIASNAME.ATTRIBUTENAME) AS CLOB)), '',','),'',''),'','')) AS TRANSPOSED_OUTPUT
Make sure you are casting your "ATTRIBUTENAME" to varchar in a subquery and then calling it here.

other possibility, with recursive cte
with tablewithrank as (
select id, category, rownumber() over(partition by category order by id) as rangid , (select count(*) from myTable f2 where f1.category=f2.category) nbidbycategory
from myTable f1
),
cte (id, category, rangid, nbidbycategory, rangconcat) as (
select id, category, rangid, nbidbycategory, cast(id as varchar(500)) from tablewithrank where rangid=1
union all
select f2.id, f2.category, f2.rangid, f2.nbidbycategory, cast(f1.rangconcat as varchar(500)) || ',' || cast(f2.id as varchar(500)) from cte f1 inner join tablewithrank f2 on f1.rangid=f2.rangid -1 and f1.category=f2.category
)
select category, rangconcat as IDS from cte
where rangid=nbidbycategory

Try this:
SELECT GROUP_CONCAT( field1, field2, field3 ,field4 SEPARATOR ', ')

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive: how to eliminate the duplicated substrings - hive

Use lateral view + explode: select t.i, concat_ws('&',collect_set(e.val)) as grouped_s from T t lateral view outer explode(split(t.s,'&')) e as val group by t.i; Result: t.i grouped_s 1 a1&b2&c3 2 c1&d2

Related

Possible to un-anonymize VALUES clause?

Compare a single-column row-set with another single-column row set in Oracle SQL

sql query db2 9.7 [duplicate]

Concatenate comma delimited column values are remove duplicates

DB2 Comma Separated Output by Groups

Categories

Resources