Oracle SQL: Match long, comma separated list

Oracle SQL: Match long, comma separated list - sql

we use Oracle 11g and I am accessing a database from our manufacturing execution system, on which I don't have any write permissions. E.g. I can't use temporary tables. I have to retrieve the data for a long list of production IDs. The format of the production IDs is flexible, so I can also use a different format, but I would expect a comma separated list is rather handy. However, my problem is how to filter my request for a long list? Using in like in
SELECT
productionid,
tool,
proddate
FROM
proddb
WHERE
productionid in ('123','231','312', ...)
is not very fast, limited to 1000 entries and I guess is not designed for this scenario. So what would be the best approach to filter a large list of hundreds or thousand production ids?

As it's already been said, the best way would be findind a table/set of tables that would return the required IDs through simple criteria (grouping data, dates, etc).
But if you absolutely MUST do your query from a huge set of arbitrary IDs,
and if consistency is not an issue for you (i.e., data won't change greatly in a moment), you could also segment the query in 4 or 5 queries with 1000 items each.

Option 1: Use a collection:
CREATE TYPE string_list IS TABLE OF VARCHAR2(20);
Then:
SELECT productionid, tool, proddate
FROM proddb
WHERE productionid MEMBER OF string_list('123','231','312',...);
or, using a the built-in SYS.ODCI*LIST types:
SELECT productionid, tool, proddate
FROM proddb
WHERE productionid IN (SELECT column_value
FROM TABLE(SYS.ODCIVARCHAR2LIST('123','231','312',...)));
Option 2: Use a multi-dimensional IN list to bypass the 1000 item restriction:
SELECT productionid, tool, proddate
FROM proddb
WHERE (productionid, 1) IN ((123,1),(231,1),(321,1),...);
Option 3: Use a subquery:
SELECT productionid, tool, proddate
FROM proddb
WHERE productionid IN (
SELECT 123 FROM DUAL UNION ALL
SELECT 213 FROM DUAL UNION ALL
SELECT 321 FROM DUAL -- UNION ALL ...
)
or:
SELECT productionid, tool, proddate
FROM proddb p
INNER JOIN (
SELECT 123 AS id FROM DUAL UNION ALL
SELECT 213 FROM DUAL UNION ALL
SELECT 321 FROM DUAL -- UNION ALL ...
) i
ON (p.productionid = i.id)
or
WITH filters AS (
SELECT 123 AS id FROM DUAL UNION ALL
SELECT 213 FROM DUAL UNION ALL
SELECT 321 FROM DUAL -- UNION ALL ...
)
SELECT productionid, tool, proddate
FROM proddb p
INNER JOIN filters f
ON (p.productionid = f.id)

Related

Why does dbms_random.value return the same value in graph queries (connect by)?

On Oracle 11.2.0.4.0, when I run the following query then each row gets a different result:
select r.n from (
select trunc(dbms_random.value(1, 100)) n from dual
) r
connect by level < 100; -- returns random values
But as soon as I use the obtained random value in a join or subquery then each row gets the same value from dbms_random.value:
select r.n, (select r.n from dual) from (
select trunc(dbms_random.value(1, 100)) n from dual
) r
connect by level < 100; -- returns the same value each time
Is it possible to make the second query return random values for each row?
UPDATE
My example was maybe over-simplified, here's what I am trying to do:
with reservations(val) as (
select 1 from dual union all
select 3 from dual union all
select 4 from dual union all
select 5 from dual union all
select 8 from dual
)
select * from (
select rnd.val, CONNECT_BY_ISLEAF leaf from (
select trunc(dbms_random.value(1, 10)) val from dual
) rnd
left outer join reservations res on res.val = rnd.val
connect by res.val is not null
)
where leaf = 1;
But with reservations which can go from 1 to 1.000.000.000 (and more).
Sometimes that query returns correctly (if it immediately picked a random value for which there was no reservation) or give an out of memory error because it always tries with the same value of dbms_random.value.

Your comment "...and I want to avoid concurrency problems" made me think.
Why don't you just try to insert a random number, watch out for duplicate violations, and retry until successful? Even a very clever solution that looks up available numbers might come up with identical new numbers in two separate sessions. So, only an inserted and committed reservation number is safe.

You can move the connect-by clause inside the subquery:
select r.n, (select r.n from dual) from (
select trunc(dbms_random.value(1, 100)) n from dual
connect by level < 100
) r;
N (SELECTR.NFROMDUAL)
---------- -------------------
90 90
69 69
15 15
53 53
8 8
3 3
...
what I try to do is generate a sequence of random numbers and find the first one for which I don't have a record in some table
You could potentially do something like:
select r.n
from (
select trunc(dbms_random.value(1, 100)) n from dual
connect by level < 100
) r
where not exists (
select id from your_table where id = r.n
)
and rownum = 1;
but it will generate all 100 random values before checking any of them, which is a bit wasteful; and as you might not find a gap in those 100 (and there may be duplicates within those hundred) you either need a much larger range which is also expensive, though doesn't need to be so many random calls:
select min(r.n) over (order by dbms_random.value) as n
from (
select level n from dual
connect by level < 100 -- or entire range of possible values
) r
where not exists (
select id from your_table where id = r.n
)
and rownum = 1;
Or repeat a single check until a match is found.
Another approach is to have a look-up table of all possible IDs with a column indicating if they are used or free, maybe with a bitmap index; and then use that to find the first (or any random) free value. But then you have to maintain that table too, and update atomically as you use and release the IDs in your main table, which means making things more complicated and serialising access - though you probably can't avoid that anyway really if you don't want to use a sequence. You could probably use a materialised view to simplify things.
And if you have a relatively small number of gaps (and you really want to reuse those) then you could possibly only search for a gap within the assigned range and then fall back to a sequencer if there are no gaps. Say you only have values in the range 1 to 1000 currently used, with a few missing; you could look for a free value in that 1-100 range, and if there are none then use a sequence to get 1001 instead, rather than always including your entire possible range of values in your gap search. That would also fill in gaps in preference to extending the used range, which may or may not be useful. (I'm not sure if "I don't need those numbers to be consecutive" means they should not be consecutive, or that it doesn't matter).
Unless you particularly have a business need to fill in the gaps and for the assigned values to not be consecutive, though, I'd just use a sequence and ignore the gaps.

I managed to obtain a correct result with the following query but I am not sure if this approach is really advisable:
with
reservations(val) as (
select 1 from dual union all
select 3 from dual union all
select 4 from dual union all
select 5 from dual union all
select 8 from dual
),
rand(v) as (
select trunc(dbms_random.value(1, 10)) from dual
),
next_res(v, ok) as (
select v, case when exists (select 1 from reservations r where r.val = rand.v) then 0 else 1 end from rand
),
recursive(i, v, ok) AS (
select 0, 0, 0 from dual
union all
select i + 1, next_res.v, next_res.ok from recursive, next_res where i < 100 /*maxtries*/ and recursive.ok = 0
)
select v from recursive where ok = 1;

SQL unique combinations

I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.
ID therapeutic_class generic_name
1 YG4 insulin
1 CJ6 maleate
1 MG9 glargine
2 C4C diaoxy
2 KR3 supplies
3 YG4 insuilin
3 CJ6 maleate
3 MG9 glargine
I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:
Count Combination_generic combination_therapeutic
2 insulin, maleate, glargine YG4, CJ6, MG9
1 supplies, diaoxy C4C, KR3

One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.
I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.
Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.
with
-- Begin simulated data (not part of the solution)
test_data ( id, therapeutic_class, generic_name ) as (
select 1, 'GY6', 'insulin' from dual union all
select 1, 'MH4', 'maleate' from dual union all
select 1, 'KJ*', 'glargine' from dual union all
select 2, 'GY6', 'supplies' from dual union all
select 2, 'C4C', 'diaoxy' from dual union all
select 3, 'GY6', 'insulin' from dual union all
select 3, 'MH4', 'maleate' from dual union all
select 3, 'KJ*', 'glargine' from dual
),
-- End of simulated data (for testing purposes only).
-- SQL query solution continues BELOW THIS LINE
valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
select rownum, therapeutic_class, generic_name
from (
select distinct therapeutic_class, generic_name
from test_data
)
),
first_agg ( id, tc_list, gn_list ) as (
select t.id,
listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
listagg(p.generic_name , ',') within group (order by p.pair_id)
from test_data t join valid_pairs p
on t.therapeutic_class = p.therapeutic_class
and t.generic_name = p.generic_name
group by t.id
)
select count(*) as cnt, tc_list, gn_list
from first_agg
group by tc_list, gn_list
;
Output:
CNT TC_LIST GN_LIST
--- ------------------ ------------------------------
1 GY6,C4C supplies,diaoxy
2 GY6,KJ*,MH4 insulin,glargine,maleate

You are looking for listagg() and then another aggregation. I think:
select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
listagg(generic_name, ', ') within group (order by generic_name) as generics
from t
group by id
) t
group by therapeutics, generics;

bigQuery - how to use row values to create columns for a new table

I have the following genomic table (over 12K rows) in BigQuery. A long list of the PIK3CA_features (column 2) are related to the same sample_id (column 1)
Row sample_id PIK3CA_features
1 hu011C57 chr3_3930069__TGT
2 hu011C57 chr3_3929921_TC
3 hu011C57 chr3_3929739_TC
4 hu011C57 chr3_3929813__T
5 hu011C57 chr3_3929897_GA
6 hu011C57 chr3_3929977_TC
7 hu011C57 chr3_3929783_TC
I would like to generate the following table:
Row sample_id chr3_3930069__TGT chr3_3929921_TC chr3_3929739_TC
1 hu011C57 1 1 0
2 hu011C58 0
Meaning, one row for every sample ID and a 1/0 if the PIK3CA_feature exist at this sample.
Any idea how to easily generate this table?
Many thanks for any idea!

The only idea that comes to mind is using the concepts of ARRAYS and STRUCTS to get somewhat close to what you need, like so:
WITH data AS(
SELECT 'hu011C57' sample_id, 'chr3_3930069__TGT' PIK3CA_features union all
SELECT 'hu011C57', 'chr3_3929921_TC' union all
SELECT 'hu011C57', 'chr3_3929739_TC' union all
SELECT 'hu011C57', 'chr3_3929813__T' union all
SELECT 'hu011C57', 'chr3_3929897_GA' union all
SELECT 'hu011C57', 'chr3_3929977_TC' union all
SELECT 'hu011C57', 'chr3_3929783_TC' union all
SELECT 'hu011C58', 'chr3_3929783_TC' union all
SELECT 'hu011C58', 'chr3_3929921_TC'
),
all_features AS (
SELECT DISTINCT PIK3CA_features FROM data
),
aggregated_samples AS(
SELECT
sample_id,
ARRAY_AGG(DISTINCT PIK3CA_features) features
FROM data
GROUP BY sample_id
)
SELECT
sample_id,
ARRAY(SELECT AS STRUCT PIK3CA_features, PIK3CA_features IN (SELECT feature FROM UNNEST(features) feature) FROM all_features AS present ORDER BY PIK3CA_features) features
FROM aggregated_samples
This will return for you one row per sample_id and a correspondent array of structs with each feature and its presence in the sample_id.
As BigQuery natively supports this type of data structure maybe you could have this representation for your data without losing any capacity for advanced analyzes such as using analytical functions, subqueries and so on.

You can accomplish this by grouping on the sample id.
SELECT
sample_id,
COUNTIF(PIK3CA_features = 'chr3_3930069__TGT') as chr3_3930069__TGT,
COUNTIF(PIK3CA_features = 'chr3_3929921_TC') as chr3_3929921_TC,
COUNTIF(PIK3CA_features = 'chr3_3929739_TC') as chr3_3929739_TC
FROM [your_table]
GROUP BY sample_id;
Assuming you have no duplicate PIK3CA_features per sample id, this should give you what you need.

Custom SORT BY SQL

I'm new to the community but have referenced it many times in the past. I have an issue I'm trying to overcome in Access, specifically with a SORT BY issue in SQL.
Long story short, I need to create a report based on the results of several different queries. I used a Union query to skirt the "Query is too complex" issue. The results of the query aren't in the order I'd like them, though.
Since this UNION query is not based on one specific table, rather the results of many queries, I'm not able to sort by a specific column header.
I want to sort the results by the way they are written in the SQL statement. Can anyone provide some insight to how to do this? I've attempted several different ways but always end up with an error message. Here's the code, and any help is greatly appreciated.
SELECT [Aqua-Anvil_Total].Expr1
FROM [Aqua-Anvil_Total];
UNION SELECT [Aqua-Reslin_Total].Expr1
FROM [Aqua-Reslin_Total];
UNION SELECT [Aqua_Zenivex_Total].Expr1
FROM [Aqua_Zenivex_Total];
UNION SELECT [Aqualuer_20-20_Total].Expr1
FROM [Aqualuer_20-20_Total];
UNION SELECT [Avalon_Total].Expr1
FROM [Avalon_Total];
UNION SELECT [BVA_13_Total].Expr1
FROM [BVA_13_Total];
UNION SELECT [Deltagard_Total].Expr1
FROM [Deltagard_Total];
UNION SELECT [Envion_Total].Expr1
FROM [Envion_Total];
UNION SELECT [Scourge_18-54_Total].Expr1
FROM [Scourge_18-54_Total];
UNION SELECT [Zenivex_E20_Total].Expr1
FROM [Zenivex_E20_Total];

This uses union all instead of union, so if you are using union to remove duplicates, there would be more work to do after this.
select Expr1
from (
select [Aqua-Anvil_Total].Expr1, 0 as sort
from [Aqua-Anvil_Total]
union all select [Aqua-Reslin_Total].Expr1, 1 as sort
from [Aqua-Reslin_Total]
union all select [Aqua_Zenivex_Total].Expr1, 2 as sort
from [Aqua_Zenivex_Total]
union all select [Aqualuer_20-20_Total].Expr1, 3 as sort
from [Aqualuer_20-20_Total]
union all select [Avalon_Total].Expr1, 4 as sort
from [Avalon_Total]
union all select [bva_13_Total].Expr1, 5 as sort
from [bva_13_Total]
union all select [Deltagard_Total].Expr1, 6 as sort
from [Deltagard_Total]
union all select [Envion_Total].Expr1, 7 as sort
from [Envion_Total]
union all select [Scourge_18-54_Total].Expr1, 8 as sort
from [Scourge_18-54_Total]
union all select [Zenivex_E20_Total].Expr1, 9 as sort
from [Zenivex_E20_Total]
) as u
order by u.sort

Need to arrange employee names as per their city column wise

I have written a query which extracts the data from different columns group by city name.
My query is as follows:
select q.first_name
from (select employee_id as eid,first_name,city
from employees
group by city,first_name,employee_id
order by first_name)q
, employees e
where e.employee_id = q.eid;
The output of the query is employee names in a single column grouped by their cities.
Now I would like to enhance the above query to classify the employees by their city names in different columns.
I tried using pivot to make this work. Here is my pivot query:
select * from (
select q.first_name
from (select employee_id as eid,first_name,city
from employees
group by city,first_name,employee_id
order by first_name)q
, employees e
where e.employee_id = q.eid
) pivot
(for city in (select city from employees))
I get some syntax issue saying missing expression and I am not sure how to use pivot to achieve the below expected output.
Expected Output:
DFW CH NY
---- --- ---
TripeH John Hitman
Batista Cena Yokozuna
Rock James Mysterio
Appreciate if anyone can guide me in the right direction.

Unfortunately what you are trying to do is not possible, at least not in "straight" SQL - you would need dynamic SQL, or a two-step process (in the first step generating a string that is a new SQL statement). Complicated.
The problem is that you are not including a fixed list of city names (as string literals). You are trying to create columns based on whatever you get from (select city from employees). Thus the number of columns and the name of the columns is not known until the Oracle engine reads the data from the table, but before the engine starts it must already know what all the columns will be. Contradiction.
Note also that if this was possible, you almost surely would want (select distinct city from employees).
ADDED: The OP asks a follow-up question in a comment (see below).
The ideal arrangement is for the cities to be in their own, smaller table, and the "city" in the employees table to have a foreign key constraint so that the "city" thing is manageable. You don't want one HR clerk to enter New York, another to enter New York City and a third to enter NYC for the same city. One way or the other, first try your code by replacing the subquery that follows the operator IN in the pivot clause with simply the comma-separated list of string literals for the cities: ... IN ('DFW', 'CH', 'NY'). Note that the order in which you put them in this list will be the order of the columns in the output. I didn't check the entire query to see if there are any other issues; try this and let us know what happens.
Good luck!

select
(CASE WHEN CITY="DFW" THEN EMPLOYEE_NAME END) DFW,
(CASE WHEN CITY="CH" THEN EMPLOYEE_NAME END) CH,
(CASE WHEN CITY="NY" THEN EMPLOYEE_NAME END) NY
FROM employees
order by first_name

Maybe you need to transpose your result. See this link . I think DECODE or CASE works best for your case:
select
(CASE WHEN CITY="DFW" THEN EMPLOYEE_NAME END) DFW,
(CASE WHEN CITY="CH" THEN EMPLOYEE_NAME END) CH,
(CASE WHEN CITY="NY" THEN EMPLOYEE_NAME END) NY
FROM employees
order by first_name

Normally I would "edit" my first answer, but the question has changed so much, it's quite different from the original one so my older answer can't be "edited" - this now needs a completely new answer.
You can do what you want with pivoting, as I show below. Wondering why you want to do this in basic SQL and not by using reporting tools, which are written specifically for reporting needs. There's no way you need to keep your data in the pivoted format in the database.
You will see 'York' twice in the Chicago column; you will recognize that's on purpose (you will see I had a duplicate row in the "test" table at the top of my code); this is to demonstrate a possible defect of your arrangement.
Before you ask if you could get the list but without the row numbers - first, if you are simply generating a set of rows, those are not ordered. If you want things ordered for reporting purposes, you can do what I did, and then select "'DFW'", "'CHI'", "'NY'" from the query I wrote. Relational theory and the SQL standard do not guarantee the row order will be preserved, but Oracle apparently does preserve it, at least in current versions; you can use that solution at your own risk.
max(name) in the pivot clause may look odd to the uninitiated; one of the weird limitations of the PIVOT operator in Oracle is that it requires an aggregate function to be used, even if it's over a set of exactly one element.
Here's the code:
with t (city, name) as -- setting up input data for testing
(
select 'DFW', 'Smith' from dual union all
select 'CHI', 'York' from dual union all
select 'DFW', 'Matsumoto' from dual union all
select 'NY', 'Abu Osman' from dual union all
select 'DFW', 'Adams' from dual union all
select 'CHI', 'Wilson' from dual union all
select 'CHI', 'Arenas' from dual union all
select 'NY', 'Theodore' from dual union all
select 'CHI', 'McGhee' from dual union all
select 'NY', 'Zhou' from dual union all
select 'NY' , 'Simpson' from dual union all
select 'CHI', 'Narayanan' from dual union all
select 'CHI', 'York' from dual union all
select 'NY', 'Perez' from dual
)
select * from
(
select row_number() over (partition by city order by name) rn,
city, name
from t
)
pivot (max(name) for city in ('DFW', 'CHI', 'NY') )
order by rn
/
And the output:
RN 'DFW' 'CHI' 'NY'
---------- --------- --------- ---------
1 Adams Arenas Abu Osman
2 Matsumoto McGhee Perez
3 Smith Narayanan Simpson
4 Wilson Theodore
5 York Zhou
6 York
6 rows selected.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas