Fast way to generate concatenated strings in Oracle [duplicate] - sql

This question already has answers here:
SQL Query to concatenate column values from multiple rows in Oracle
(10 answers)
Closed 1 year ago.
Don't we hate when evil coding comes back to haunt?
Some time ago I needed to generate a string concatenating some fields for some more processing later. I thought it would be a good idea to do if straight in the query, and used SO's help to get it. It worked. For a while...
The table got to big and now that trick (which I know is super inefficient) is not exactly viable. This what I'm doing:
with my_tabe as
(
select 'user1' as usrid, '1' as prodcode from dual union
select 'user1' as usrid, '2' as prodcode from dual union
select 'user1' as usrid, '3' as prodcode from dual union
select 'user2' as usrid, '2' as prodcode from dual union
select 'user2' as usrid, '3' as prodcode from dual union
select 'user2' as usrid, '4' as prodcode from dual
)
select
usrid,
ltrim(sys_connect_by_path(prodcode, '|'), '|') as prodcode
from
(
select distinct prodcode, usrid,count(1)
over (partition by usrid) as cnt,
row_number() over (partition by usrid order by prodcode) as rn
from my_tabe
)
where
rn = cnt
start with rn = 1
connect by prior rn + 1 = rn
and prior usrid = usrid
Which nicely yields:
USRID PRODCODE
user1 1|2|3
user2 2|3|4
The evil thing in here, as you might have noticed, is the where rn = cnt, which if you remove you'll see all the work (I suppose) Oracle is really doing:
USRID PRODCODE
user1 1
user1 1|2
user1 1|2|3
user2 2
user2 2|3
user2 2|3|4
I'm actually using this in many places where I have not so many records. It is quite fine up to about a half million records.
Recently I tried the same in a table with ~15Mi records, and well... no good.
Question: is there a way to do this more efficiently on Oracle or is it time bring it down to the actual code?
This is not actual core issue, so I can still afford kludging, as long as it's fast...
Worth mentioning there's a index for the column "usrid" I'm using.
cheers,

Tom Kyte provides a very convenient way to do that, and it works from Oracle 9i, with a custom aggregation function. It aggregates with commas, but you can modify the function body for pipes.
Starting with Oracle 11g, you can do:
SELECT LISTAGG(column, separator) WITHIN GROUP (ORDER BY field)
FROM dataSource
GROUP BY grouping columns
This web page provides additional methods including the one that you listed and which is indeed not really efficient.

Related

stack or union multiple fields in MS Access

Beginner's question here... I have a table of tree measurements being 3 fields: - ID, Diameter_1, Diameter_2
& I wish to get to these 3 fields: - ID, DiameterName, DiameterMeasurement
Input and Desired Output
SELECT DISTINCT ID, Diameter_1
FROM tblDiameters
UNION SELECT DISTINCT ID, Diameter_2
FROM tblDiameters;
Though it results in only 2 fields. How may the field: - DiameterMeasurement be brought in?
Many thanks :-)
You were on the right track to use a union. Here is one viable approach:
SELECT ID, 'Diameter_1' AS DiameterName, Diameter_1 AS DiameterMeasurement
FROM tblDiameters
UNION ALL
SELECT ID, 'Diameter_2', Diameter_2
FROM tblDiameters
ORDER BY ID, DiameterName;

How can I select the max counted values from that was 'GROUP BY' twice?

Sorry for the bad title, I need to improve on how to explain my problem better, obviously.
I'm practicing queries on the Adventure Works data in SQL server, and I queried such as:
SELECT a.City, pc.Name, COUNT(pc.Name) AS 'Count'
FROM SalesLT.SalesOrderHeader AS oh
JOIN SalesLT.SalesOrderDetail AS od ON oh.SalesOrderID = oh.SalesOrderID
JOIN SalesLT.Product AS p ON od.ProductID = p.ProductID
JOIN SalesLT.ProductCategory AS pc ON p.ProductCategoryID = pc.ProductCategoryID
JOIN SalesLT.Address AS a ON oh.ShipToAddressID = a.AddressID
GROUP BY a.City, pc.Name
ORDER BY a.City;
Which gives:
City Name Count
---- ------ ------
Abingdon Cleaners 7
Abingdon Vests 16
Abingdon Pedals 29
Alhambra Jersey 44
Alhambra Vests 16
Auburn Hydration Packs 7
Auburn Derailleurs 8
And I'm trying to only get the largest count item, expected output looks like this:
City Name Count
---- ------ ------
Abingdon Pedals 29
Alhambra Jersey 44
Auburn Derailleurs 8
The max count for each city. Since I don't have a sample dataset, so I'm not asking for the exact query, but can you give me some idea of what I should look into? I've been trying different 'group by's, but I always end up getting this far, but I can never move forward.
Also, the question was "Identify the three most important cities. Show the break down of top-level product category against the city.", if possible, can you please share how you would approach this problem? I've been trying to improve my SQL skills, but I have a hard time writing a complex query. It would be greatly appreciated if you can share a tip to approach complex queries.
Any guidance on how I should approach this problem would be appreciated.
In this case you want to get the maximum value of count group by city, this can be achieve by using subquery.
here is some pseudo example for you to reference.
with cte as (
select '1' as id, '1' as val
union all
select '2' as id, '1' as val
union all
select '2' as id, '2' as val
union all
select '3' as id, '1' as val
union all
select '3' as id, '1' as val
union all
select '3' as id, '1' as val
union all
select '3' as id, '2' as val
union all
select '3' as id, '2' as val
),
a as(
select id,val,count(id) as cou
from cte
group by id,val
)
select * from (
select *,max(cou) over(partition by id) as max_cou from a
) b
where cou = max_cou
the first cte is just pseudo data and second cte a is the part reference OP current query. rest is the solution.
here is the db<>fiddle link.
As I recall sql-server does not accept nested aggregate function, so subquery is a much easier approach.
If anyone have a way cleaner and simpler query I will be happy to see it too :D

SQL unique combinations

I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.
ID therapeutic_class generic_name
1 YG4 insulin
1 CJ6 maleate
1 MG9 glargine
2 C4C diaoxy
2 KR3 supplies
3 YG4 insuilin
3 CJ6 maleate
3 MG9 glargine
I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:
Count Combination_generic combination_therapeutic
2 insulin, maleate, glargine YG4, CJ6, MG9
1 supplies, diaoxy C4C, KR3
One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.
I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.
Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.
with
-- Begin simulated data (not part of the solution)
test_data ( id, therapeutic_class, generic_name ) as (
select 1, 'GY6', 'insulin' from dual union all
select 1, 'MH4', 'maleate' from dual union all
select 1, 'KJ*', 'glargine' from dual union all
select 2, 'GY6', 'supplies' from dual union all
select 2, 'C4C', 'diaoxy' from dual union all
select 3, 'GY6', 'insulin' from dual union all
select 3, 'MH4', 'maleate' from dual union all
select 3, 'KJ*', 'glargine' from dual
),
-- End of simulated data (for testing purposes only).
-- SQL query solution continues BELOW THIS LINE
valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
select rownum, therapeutic_class, generic_name
from (
select distinct therapeutic_class, generic_name
from test_data
)
),
first_agg ( id, tc_list, gn_list ) as (
select t.id,
listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
listagg(p.generic_name , ',') within group (order by p.pair_id)
from test_data t join valid_pairs p
on t.therapeutic_class = p.therapeutic_class
and t.generic_name = p.generic_name
group by t.id
)
select count(*) as cnt, tc_list, gn_list
from first_agg
group by tc_list, gn_list
;
Output:
CNT TC_LIST GN_LIST
--- ------------------ ------------------------------
1 GY6,C4C supplies,diaoxy
2 GY6,KJ*,MH4 insulin,glargine,maleate
You are looking for listagg() and then another aggregation. I think:
select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
listagg(generic_name, ', ') within group (order by generic_name) as generics
from t
group by id
) t
group by therapeutics, generics;

Custom SORT BY SQL

I'm new to the community but have referenced it many times in the past. I have an issue I'm trying to overcome in Access, specifically with a SORT BY issue in SQL.
Long story short, I need to create a report based on the results of several different queries. I used a Union query to skirt the "Query is too complex" issue. The results of the query aren't in the order I'd like them, though.
Since this UNION query is not based on one specific table, rather the results of many queries, I'm not able to sort by a specific column header.
I want to sort the results by the way they are written in the SQL statement. Can anyone provide some insight to how to do this? I've attempted several different ways but always end up with an error message. Here's the code, and any help is greatly appreciated.
SELECT [Aqua-Anvil_Total].Expr1
FROM [Aqua-Anvil_Total];
UNION SELECT [Aqua-Reslin_Total].Expr1
FROM [Aqua-Reslin_Total];
UNION SELECT [Aqua_Zenivex_Total].Expr1
FROM [Aqua_Zenivex_Total];
UNION SELECT [Aqualuer_20-20_Total].Expr1
FROM [Aqualuer_20-20_Total];
UNION SELECT [Avalon_Total].Expr1
FROM [Avalon_Total];
UNION SELECT [BVA_13_Total].Expr1
FROM [BVA_13_Total];
UNION SELECT [Deltagard_Total].Expr1
FROM [Deltagard_Total];
UNION SELECT [Envion_Total].Expr1
FROM [Envion_Total];
UNION SELECT [Scourge_18-54_Total].Expr1
FROM [Scourge_18-54_Total];
UNION SELECT [Zenivex_E20_Total].Expr1
FROM [Zenivex_E20_Total];
This uses union all instead of union, so if you are using union to remove duplicates, there would be more work to do after this.
select Expr1
from (
select [Aqua-Anvil_Total].Expr1, 0 as sort
from [Aqua-Anvil_Total]
union all select [Aqua-Reslin_Total].Expr1, 1 as sort
from [Aqua-Reslin_Total]
union all select [Aqua_Zenivex_Total].Expr1, 2 as sort
from [Aqua_Zenivex_Total]
union all select [Aqualuer_20-20_Total].Expr1, 3 as sort
from [Aqualuer_20-20_Total]
union all select [Avalon_Total].Expr1, 4 as sort
from [Avalon_Total]
union all select [bva_13_Total].Expr1, 5 as sort
from [bva_13_Total]
union all select [Deltagard_Total].Expr1, 6 as sort
from [Deltagard_Total]
union all select [Envion_Total].Expr1, 7 as sort
from [Envion_Total]
union all select [Scourge_18-54_Total].Expr1, 8 as sort
from [Scourge_18-54_Total]
union all select [Zenivex_E20_Total].Expr1, 9 as sort
from [Zenivex_E20_Total]
) as u
order by u.sort

SQL using Count, with same "Like" multiple times in same cell

I'm trying to get a count on how many times BNxxxx has been commented in the comments cell. So far, I can make each cell be counted once, but there may be multiple comments in a cell containing BNxxxx.
For example, this:
-------
BN0012
-------
BN0012
-------
BN0012
BN0123
-------
should show an output of BN0012 3 times and BN0123 once. Instead, I get BN0012 3 times only.
Here's my code:
select COMMENTS, count(*) as TOTAL
from NOTE
Where COMMENTS like '%BN%' AND CREATE_DATE between '01/1/2015' AND '11/03/2015'
group by COMMENTS
order by Total desc;
Any ideas?
edit
My code now looks like
select BRIDGE_NO, count(*)
from IACD_ASSET b join
IACD_NOTE c
on c.COMMENTS like concat(concat('BN',b.BRIDGE_NO),'%')
Where c.CREATE_DATE between '01/1/2015' AND '11/03/2015' AND length(b.BRIDGE_NO) > 1
group by b.BRIDGE_NO
order by count(*);
Problem with this is the BN44 is the same as BN4455 .. have tried concat(concat('BN',b.BRIDGE_NO),'_') comes back with nothing , any ideas how i can get exact likes
You have a problem. Let me assume that you have a table of all known BN values that you care about. Then you can do something like:
select bn.fullbn, count(*)
from tableBN bn join
comments c
on c.comment like ('%' || bn.fullbn || '%')
group by bn.fullbn;
The performance of this might be quite slow.
If you happen to be storing lists of things in the comment field, then this is a very bad idea. You should not store lists in strings; you should use a junction table.
I'm going to assume that your COMMENTS table has a primary key column (such as comment_id) or at least that comments isn't a CLOB. If it is a CLOB then you're not going to be able to use GROUP BY on that column.
You can accomplish this as follows without even a lookup table of BN.... values. No guarantees as to the performance:
WITH d1 AS (
SELECT 1 AS comment_id, 'BN0123 is a terrible thing BN0121 also BN0000' AS comments
, date'2015-01-03' AS create_date
FROM dual
UNION ALL
SELECT 2 AS comment_id, 'BN0125 is a terrible thing BN0120 also BN1000' AS comments
, date'2015-02-03' AS create_date
FROM dual
)
SELECT comment_id, comments, COUNT(*) AS total FROM (
SELECT comment_id, comments, TRIM(REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i')) AS bn
FROM d1
WHERE create_date >= date'2015-01-01'
AND create_date < date'2015-11-04'
CONNECT BY REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i') IS NOT NULL
AND PRIOR comment_id = comment_id
AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL
) GROUP BY comment_id, comments;
Note that I corrected your filter:
CREATE_DATE between '01/1/2015' AND '11/03/2015'
First, you should be using ANSI date literals (e.g., date'2015-01-01'); second, using BETWEEN for dates is often a bad idea as Oracle DATE values contain a time portion. So this should be rewritten as:
create_date >= date'2015-01-01'
AND create_date < date'2015-11-04'
Note that the later date is November 4, to make sure we capture all possible comments that were made on November 3.
If you want to see the matched comments without aggregating the counts, then do the following (taking out the outer query, basically):
WITH d1 AS (
SELECT 1 AS comment_id, 'BN0123 is a terrible thing BN0121 also BN0000' AS comments
, date'2015-01-03' AS create_date
FROM dual
UNION ALL
SELECT 2 AS comment_id, 'BN0125 is a terrible thing BN0120 also BN1000' AS comments
, date'2015-02-03' AS create_date
FROM dual
)
SELECT comment_id, comments, TRIM(REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i')) AS bn
FROM d1
WHERE create_date >= date'2015-01-01'
AND create_date < date'2015-11-04'
CONNECT BY REGEXP_SUBSTR(comments, '(^|\s)BN\d+(\s|$)', 1, LEVEL, 'i') IS NOT NULL
AND PRIOR comment_id = comment_id
AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL;
Given the edits to your question, I think you want something like the following:
SELECT b.bridge_no, COUNT(*) AS comment_cnt
FROM iacd_asset b INNER JOIN iacd_note c
ON REGEXP_LIKE(c.comments, '(^|\W)BN' || b.bridge_no || '(\W|$)', 'i')
WHERE c.create_dt >= date'2015-01-01'
AND c.create_dt < date'2015-03-12' -- It just struck me that your dates are dd/mm/yyyy
AND length(b.bridge_no) > 1
GROUP BY b.bridge_no
ORDER BY comment_cnt;
Note that I am using \W in the regex above instead of \s as I did earlier to make sure that it captures things like BN1234/BN6547.
Try use the distinct keyword in your select statement, to pull in unique values for the comments. Like this:
select distinct COMMENTS, count(*) as TOTAL
from NOTE
Where COMMENTS like '%BN%' AND CREATE_DATE between '01/1/2015' AND
'11/03/2015'
group by COMMENTS
order by Total desc;