Oracle - concat one or more rows to different result columns - sql

I have a table structure with data which looks like:
EventNbr | NoteNbr | NoteText
1 1 Example title
1 2 text1
1 3 text2
2 4 Example title 2
3 5 Example title 3
3 6 text3
What I need as a result is a data set which looks like
EventNbr | Title | Notes
1 Example Title text1,text2
2 Example Title2
3 Example Title3 text3
I am basically taking the minimum NoteNbr from each EventNbr and putting it in the Title column and then every other NoteNbr after the MIN would be the concatenated together with a comma in the Notes column.
What I currently have works, but only for EventNbrs which have multiple NoteNbr rows. It does not work for items which only have one NoteNbr row like EventNbr 2 above.
SELECT A.EventNbr,
MIN(A.NoteText) AS Title,
LISTAGG(A.NoteText, ',') WITHIN GROUP(ORDER BY A.NoteNbr) AS Notes
FROM EventNote A
INNER JOIN (SELECT Min(NoteNbr) Min_NoteNbr, EventNbr
FROM EventNote
GROUP BY EventNbr) B
ON (A.NoteNbr <> B.Min_NoteNbr AND A.EventNbr = B.EventNbr)
INNER JOIN EventNote C
ON (C.NoteNbr = B.Min_NoteNbr AND C.EventNbr = B.EventNbr)
GROUP BY A.EventNbr;
Result:
EventNbr | Title | Notes
1 Example Title text1,text2
3 Example Title3 text3
What do I need to add to consider scenarios where there is only one NoteNbr row?

You can use conditional aggregation and row_number():
select eventnbr,
max(case when seqnum = 1 then notetext end) as title,
listagg(case when seqnum > 1 then notetext end, ',') within group (order by seqnum) as notes
from (select en.*,
row_number() over (partition by eventnbr order by notenbr) as seqnum
from eventnote en
) en
group by eventnbr;

It may be best to run the aggregation first. This will produce almost the result you need, except it will still concatenate the title at the beginning of the notes. That can be corrected after the fact, using standard string functions substr, instr and concatenation. (The latter is needed to deal with the "exceptional case" you mentioned, when there are no actual notes.)
The advantage is that the additional operation is only performed on the output - expected to be (far?) fewer rows than the input, and the additional operation is a trivial string manipulation rather than an additional layer of sorting and partitioning.
Something like this - assuming the inputs are all in a single table (as the first part of your question implies) rather than in different tables (as your existing code suggests). I included a with clause to simulate the input table.
Note - the execution plan shows that the optimizer is smart enough to merge the subquery into the main query; the plan consists of a single SELECT operation over an aggregation (GROUP BY). all_notes is replaced with its long definition as a listagg right in the subquery, and the outer query is completely eliminated. So we have the best of both worlds: a query that can be read, but the execution is as efficient as possible.
with
eventnote(eventnbr, notenbr, notetext) as (
select 1, 1, 'Example title' from dual union all
select 1, 2, 'text1' from dual union all
select 1, 3, 'text2' from dual union all
select 2, 4, 'Example title 2' from dual union all
select 3, 5, 'Example title 3' from dual union all
select 3, 6, 'text3' from dual
)
select eventnbr, title,
substr(all_notes, instr(all_notes || ',', ',') + 1) as notes
from (
select eventnbr,
min(notetext) keep (dense_rank first order by notenbr) as title,
listagg(notetext, ',') within group (order by notenbr) as all_notes
from eventnote
group by eventnbr
)
order by eventnbr
;
EVENTNBR TITLE NOTES
-------- --------------- ------------------------------
1 Example title text1,text2
2 Example title 2
3 Example title 3 text3

Related

How to limit number of groups returned in a query, but not the number of rows in Oracle

How to limit the number of groups in a query, but not the number of rows in Oracle?
If I had to do that manually, I would have to use a DISTINCT.
Would be something like this:
FOR d IN (
SELECT DISTINCT COLUMN_1 FROM myTable
WHERE myDate BETWEEN x AND y
OFFSET o ROWS
FETCH NEXT l ROWS ONLY
) LOOP
And then, do the selects from each of the ids returned in the query, which, in my opinion, is a terrible solution.
SAMPLE DATA:
If I limit the number of groups to 2 by using COLUMN_2, the expected result should be something like:
I believe you may be looking for something like this:
select *
from mytable
where id in (
select distinct id
from my_table
where my_date between x and y
fetch first :n rows only
)
;
:n is a bind variable, encoding the number of groups you want to select.
This should be more efficient than solutions using analytic functions - even if it must read the base table twice. In tests posted on OTN, I showed that the difference is not small.
EDIT If I remember correctly, FETCH is not implemented in the most efficient way (perhaps for good reasons, having to do with features we don't need in this query - such as how to deal with ties). FETCH itself resembles a DENSE_RANK() implementation rather than the faster row limiting clause (using ROWNUM). I would likely need to modify the query to do away with FETCH, if speed was really important. END EDIT
Further edit to do with performance comparisons
Frequent poster MT0 requested a pointer for the claim that aggregate solutions can (and often are) more efficient than analytic function approaches, even when the former may require multiple passes through the data where the analytic function approach requires only one.
Alas, OTN (what now calls itself the "Oracle Groundbreakers Developer Community", the discussion board hosted by Oracle itself) went through a massive - and massively botched - platform change at the end of September 2020; that messed up both the search facilities and the formatting of old posts, to the point of rendering them almost unusable.
Instead, I will show here a simple mock-up of the OP's problem in this thread; code that anyone can run so they can repeat the tests on their own machine.
I created a table with two columns, ID and STR - the ID plays the same role as in the OP's question, and STR is just extra payload to mimic real-life data. ID is number and STR is varchar2(100). I populated the table with 9 million rows - 1 million ID's, nine rows for each ID. The task is to select just three "groups" (three distinct ID's, then select all the rows from the base table for those three distinct ID's).
With no index on the ID column, the aggregate solution runs in 0.81 seconds on my machine; with an index on ID, it runs in 0.47 seconds. The analytic functions solution runs in 0.91 seconds, with or without an index (obviously - there is no way an index can benefit the analytic function solution). All these results are for column ID not declared NOT NULL.
Here is the code to create the table, the index on ID, and the two queries I tested. Note: As I explained in my first edit (above), fetch is slow; I replaced it with a standard row-limiting technique using ROWNUM in an over-query.
drop table t purge;
create table t (id number, str varchar2(100));
insert into t
with row_gen as (select level from dual connect by level <= 3000)
select mod(344227 * rownum, 1000000), rpad('x', 100, 'x')
from row_gen cross join row_gen
;
commit;
create index t_idx on t(id);
select *
from t
where id in (
select id from (select distinct id from t)
where rownum <= 3
);
select *
from ( select t.*, dense_rank() over (order by id) dr from t )
where dr <= 3;
You can use DENSE_RANK:
SELECT *
FROM (
SELECT t.*,
DENSE_RANK() OVER ( ORDER BY column2 ) AS rnk
FROM table_name t
)
WHERE rnk <= 2;
Which, for the sample data:
CREATE TABLE table_name ( column1, column2, column3, column4 ) AS
SELECT 1, 1, 1.0, 1.0 FROM DUAL UNION ALL
SELECT 2, 2, 2.0, 2.0 FROM DUAL UNION ALL
SELECT 2, 2, 2.2, 2.1 FROM DUAL UNION ALL
SELECT 2, 2, 2.2, 2.2 FROM DUAL UNION ALL
SELECT 2, 2, 2.0, 2.3 FROM DUAL UNION ALL
SELECT 3, 3, 3.0, 3.1 FROM DUAL UNION ALL
SELECT 3, 3, 3.1, 3.1 FROM DUAL UNION ALL
SELECT 3, 3, 3.1, 3.1 FROM DUAL UNION ALL
SELECT 4, 4, 4.2, 4.0 FROM DUAL;
Outputs:
COLUMN1 | COLUMN2 | COLUMN3 | COLUMN4 | RNK
------: | ------: | ------: | ------: | --:
1 | 1 | 1 | 1 | 1
2 | 2 | 2 | 2 | 2
2 | 2 | 2.2 | 2.1 | 2
2 | 2 | 2.2 | 2.2 | 2
2 | 2 | 2 | 2.3 | 2
(and, if you want DISTINCT rows then add DISTINCT to the outer query)
db<>fiddle here
If I understand correctly, you want ROW_NUMBER():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) as seqnum
FROM myTable t
WHERE t.myDate BETWEEN x AND y
) t
WHERE seqnum = 1;
This returns an arbitrary row for each id meeting the conditions.

How to concatenate multiple columns vertically efficiently

for the end goal, I want to create a table that looks like something like this:
Table 1
option_ID person_ID option
1 1 B
2 1
3 2 C
4 2 A
5 3 A
6 3 B
The idea is that a person can choose up to 2 options out of 3 (in this case person 1 only chose 1 option). However, when my raw data format puts the 3 options into one single column, ie:
Table 2
person_ID option
1 B
2 C,A
3 A,B
What I usually do is the use 'Text to Columns' function using the ',' delimiter in Excel, and manually concatenate the 2 columns vertically. However, I find this method to become impractical when faced with more options (say 10 or even 20). Is there a way for me to get from Table 2 to Table 1 efficiently using postgresql or some other methods?
use string_agg() function.
select person_ID, string_agg(option, ',') as option
from table1
group by person_ID
You can use regexp_split_to_table():
select row_number() over () as id,
t.person_id, v.option
from t cross join lateral
regexp_split_to_table(t.option, ',') option
order by person_id, option;
Here is a db<>fiddle.
Actually, if you want the exactly two rows per personid:
select row_number() over () as id, t.person_id, v.option
from t cross join lateral
(values (1, split_part(t.option, ',', 1)), (2, split_part(t.option, ',', 2))) v(pos, option)
order by person_id, pos;

Translate code string into desc in hive

Here we have a hyphened string like 0-1-3.... and the length is not fixed,
also we have a DETAIL table in hive to explain the meaning of each code.
DETAIL
| code | desc |
+ ---- + ---- +
| 0 | AAA |
| 1 | BBB |
| 2 | CCC |
| 3 | DDD |
Now we need a hive query to convert the code string into a description string.
For example: the case 0-1-3 should get a string like AAA-BBB-DDD.
any advice on how to get that ?
Split your string to get an array, explode array and join with detail table (CTE is used in my example instead of it, use normal table instead) to get desc joined with code. Then assemble string using collect_list(desc) to get an array + concat_ws() to get concatenated string:
select concat_ws('-',collect_list(d.desc)) as code_desc
from
( --initial string explode
select explode(split('0-1-3','-')) as code
) s
inner join
(-- use your table instead of this subquery
select 0 code, 'AAA' desc union all
select 1, 'BBB' desc union all
select 2, 'CCC' desc union all
select 3, 'DDD' desc
) d on s.code=d.code;
Result:
OK
AAA-BBB-DDD
Time taken: 114.798 seconds, Fetched: 1 row(s)
In case you need to preserve the original order, then use posexplode it returns the element as well as its position in the original array. Then you can order by record ID and pos before collect_list().
If your string is a table column then use lateral view to select exploded values.
This is more complicated example with order preserved and lateral view.
select str as original_string, concat_ws('-',collect_list(s.desc)) as transformed_string
from
(
select s.str, s.pos, d.desc
from
( --initial string explode with ordering by str and pos
--(better use your table PK, like ID instead of str for ordering), pos
select str, pos, code from ( --use your table instead of this subquery
select '0-1-3' as str union all
select '2-1-3' as str union all
select '3-2-1' as str
)s
lateral view outer posexplode(split(s.str,'-')) v as pos,code
) s
inner join
(-- use your table instead of this subquery
select 0 code, 'AAA' desc union all
select 1, 'BBB' desc union all
select 2, 'CCC' desc union all
select 3, 'DDD' desc
) d on s.code=d.code
distribute by s.str -- this should be record PK candidate
sort by s.str, s.pos --sort on each reducer
)s
group by str;
Result:
OK
0-1-3 AAA-BBB-DDD
2-1-3 CCC-BBB-DDD
3-2-1 DDD-CCC-BBB
Time taken: 67.534 seconds, Fetched: 3 row(s)
Note that distribute + sort is being used instead of simply order by str, pos. distribute + sort works in fully distributed mode, order by will work also correct but on single reducer.

Oracle SQL: Limiting multiple where clauses

Apologies if this seems like a duplicate to this question but I believe my use case is slightly different.
I have two tables.
Table1
ID INTCODE
-----------------------------
000019827364 1
000019829201 2
890418392101 3
890418390395 4
890418398677 5
505586578932 6
505586578914 7
505586578933 8
505586578012 9
490201827383 10
490201827466 11
001952046578 12
Table2
INTCODE Category
-------------------------
1 Display
2 Display
3 Display
4 Display
5 Display
6 Audio
7 Audio
8 Audio
9 Audio
10 Audio
11 Audio
12 Audio
My expected query results are all possible 5 digit prefixes of each category and in each of these prefixes - I want to extract at least 2 full IDs. Below is an example if I had a where clause for category as 'Display'.
ID PREFIX Category ID
-----------------------------------------------
00001 Display 000019827364
00001 Display 000019829201
89041 Display 890418392101
89041 Display 890418390395
The query I currently have is
SELECT
SUBSTR(t1.ID, 1, 5)
FROM
table1 t1
,table2 t2
WHERE
AND UPPER(t2.category) = 'DISPLAY'
AND t2.REGION_ID = 1
AND t2.ZONE_ID = 2
AND t1.REGION_ID = 1
AND t1.ZONE_ID = 2
AND t1.INTCODE = t2.INTCODE
GROUP BY
SUBSTR(t1.ID, 1, 5)
I am now kind of lost. Should I be running another query where I say
t1.ID LIKE '00001%'
OR LIKE '89041%'
This list will go on to be huge cause some of the categories have 400-500 prefixes. Is there a better way to go about this? Possibly in a single query?
I'm using Oracle SQL.
Many thanks!
You can use row_number() for this:
select Category, ID, IDPrefix
from (select Category, ID, SUBSTR(ID, 1, 5) as IDPREFIX,
ROW_NUMBER() OVER (PARTITION BY SUBSTR(ID, 1, 5) ORDER BY ID) as seqnum
FROM table1 JOIN
table2 t2
ON t1.INTCODE = t2.INTCODE ANd
t1.Region_id = t2.Region_id and
t1.zone_id = t2.zone_id
WHERE UPPER(t2.category) = 'DISPLAY'
) t
WHERE seqnum <= 2;
Assuming you can to display two rows with different Id, without any more constraint, you could simply use an union where the first query would select the max id, and the second query the min id.
So your query would look something like this
select id_prefix, category, max(id)
from yourTable
union
select id_prefix, category, min(id)
from yourTable
Now simply add to this algorithm your where conditions.

How to select the top n from a union of two queries where the resulting order needs to be ranked by individual query?

Let's say I have a table with usernames:
Id | Name
-----------
1 | Bobby
20 | Bob
90 | Bob
100 | Joe-Bob
630 | Bobberino
820 | Bob Junior
I want to return a list of n matches on name for 'Bob' where the resulting set first contains exact matches followed by similar matches.
I thought something like this might work
SELECT TOP 4 a.* FROM
(
SELECT * from Usernames WHERE Name = 'Bob'
UNION
SELECT * from Usernames WHERE Name LIKE '%Bob%'
) AS a
but there are two problems:
It's an inefficient query since the sub-select could return many rows (looking at the execution plan shows a join happening before top)
(Almost) more importantly, the exact match(es) will not appear first in the results since the resulting set appears to be ordered by primary key.
I am looking for a query that will return (for TOP 4)
Id | Name
---------
20 | Bob
90 | Bob
(and then 2 results from the LIKE query, e.g. 1 Bobby and 100 Joe-Bob)
Is this possible in a single query?
You could use a case to place the exact matches on top:
select top 4 *
from Usernames
where Name like '%Bob%'
order by
case when Name = 'Bob' then 1 else 2 end
Or, if you're worried about performance and have an index on (Name):
select top 4 *
from (
select 1 as SortOrder
, *
from Usernames
where Name = 'Bob'
union all
select 2
, *
from Usernames
where Name like '%Bob%'
and Name <> 'Bob'
and 4 >
(
select count(*)
from Usernames
where Name = 'Bob'
)
) as SubqueryAlias
order by
SortOrder
A slight modification to your original query should solve this. You could add in an additional UNION that matches WHERE Name LIKE 'Bob%' and give this priority 2, changing the '%Bob' priority to 3 and you'd get an even better search IMHO.
SELECT TOP 4 a.* FROM
(
SELECT *, 1 AS Priority from Usernames WHERE Name = 'Bob'
UNION
SELECT *, 2 from Usernames WHERE Name LIKE '%Bob%'
) AS a
ORDER BY Priority ASC
This might do what you want with better performance.
SELECT TOP 4 a.* FROM
(
SELECT TOP 4 *, 1 AS Sort from Usernames WHERE Name = 'Bob'
UNION ALL
SELECT TOP 4 *, 2 AS Sort from Usernames WHERE Name LIKE '%Bob%' and Name <> 'Bob'
) AS a
ORDER BY Sort
This works for me:
SELECT TOP 4 * FROM (
SELECT 1 as Rank , I, name FROM Foo WHERE Name = 'Bob'
UNION ALL
SELECT 2 as Rank,i,name FROM Foo WHERE Name LIKE '%Bob%'
) as Q1
ORDER BY Q1.Rank, Q1.I
SET ROWCOUNT 4
SELECT * from Usernames WHERE Name = 'Bob'
UNION
SELECT * from Usernames WHERE Name LIKE '%Bob%'
SET ROWCOUNt 0
The answer from Will A got me over the line, but I'd like to add a quick note, that if you're trying to do the same thing and incorporate "FOR XML PATH", you need to write it slightly differently.
I was specifying XML attributes and so had things like :
SELECT Field_1 as [#attr_1]
What you have to do is remove the "#" symbol in the sub queries and then add them back in with the outer query. Like this:
SELECT top 1 a.SupervisorName as [#SupervisorName]
FROM
(
SELECT (FirstNames + ' ' + LastName) AS [SupervisorName],1 as OrderingVal
FROM ExamSupervisor SupervisorTable1
UNION ALL
SELECT (FirstNames + ' ' + LastName) AS [SupervisorName],2 as OrderingVal
FROM ExamSupervisor SupervisorTable2
) as a
ORDER BY a.OrderingVal ASC
FOR XML PATH('Supervisor')
This is a cut-down version of my final query, so it doesn't really make sense, but you should get the idea.