I am looking into the Oracle SQL Model clause. I am trying to write dynamic Oracle SQL which can be adapted to run for a varying number of columns each time, using this model clause. However I am struggling to see how I could adapt this (even using PL/SQL) to a dynamic/generic query or procedure
here is a rough view of the table I am working on
OWNER||ACCOUNT_YEAR||ACCOUNT_NAME||PERIOD_1||PERIOD_2||PERIOD_3||PERIOD_4||PERIOD_5||PERIOD_6||....
---------------------------------------------------------------------------------------------------
9640|| 2018 ||something 1|| 34 || 444 || 982 || 55 || 42 || 65 ||
9640|| 2018 ||something 2|| 333 || 65 || 666 || 78 || 44 || 55 ||
9640|| 2018 ||something 3|| 6565 || 783 || 32 || 12 || 46 || 667 ||
Here is what I have so far:
select OWNER, PERIOD_1, PERIOD_2, PERIOD_3, PERIOD_4, PERIOD_5, PERIOD_6, PERIOD_7, PERIOD_8, PERIOD_9, PERIOD_10, PERIOD_11, PERIOD_12, ACCOUNT_YEAR, ACCOUNT_NAME
from DATA-TABLE
where OWNER IN ('9640') and PERIOD_1 is not null
MODEL ignore nav
Return UPDATED ROWS
PARTITION BY (OWNER, ACCOUNT_NAME)
DIMENSION BY (ACCOUNT_YEAR)
MEASURES (PERIOD_1,PERIOD_2, PERIOD_3, PERIOD_4, PERIOD_5, PERIOD_6, PERIOD_7, PERIOD_8, PERIOD_9, PERIOD_10, PERIOD_11, PERIOD_12)
RULES
(
PERIOD_1[2021] = PERIOD_1[2018] * 1.05,
PERIOD_2[2021] = PERIOD_2[2018] * 1.05,
PERIOD_3[2021] = PERIOD_3[2018] * 1.05,
PERIOD_4[2021] = PERIOD_4[2018] * 1.05,
PERIOD_5[2021] = PERIOD_6[2018] * 1.05,
PERIOD_7[2021] = PERIOD_7[2018] * 1.05,
PERIOD_8[2021] = PERIOD_8[2018] * 1.05,
PERIOD_9[2021] = PERIOD_9[2018] * 1.05,
PERIOD_10[2021] = PERIOD_10[2018] * 1.05,
PERIOD_11[2021] = PERIOD_11[2018] * 1.05,
PERIOD_12[2021] = PERIOD_12[2018] * 1.05
)
ORDER BY ACCOUNT_YEAR asc;
As you can see in the measures and rules section, I am currently hardcoding each period column into this query
I want to be able to use this model clause (well specifically the rule part in a flexible way, so I can have a query which could be run for say, just period 1 -3, or period 5-12...
I have tried looking into this but all examples show the left hand side of the rule (e.g. PERIOD_12[2021] =...) to explicitly refer to a column in a table, rather than a parameter or variable I can swap in for something else simply
Any help on how I might accomplish this through SQL or PLSQL would be greatly appreciated
First, you should try to avoid dynamic columns by changing the table structure to a simpler format. SQL is much simpler if you store the data vertically instead of horizontally - use multiple rows instead of multiple columns.
If you can't change the data structure, you still want to keep the MODEL query as simple as possible, because the MODEL clause is a real pain to work with. Transform the table from columns to rows using UNPIVOT, run a simplified MODEL query, and then transform the results back if necessary.
If you really, really need dynamic columns in a pure SQL statement, you'll either need to use an advanced data type like Gary Myers suggested, or use the Method4 solution below.
Sample Schema
To make the examples fully reproducible, here's the sample data I used, along with the MODEL query (which I had to slightly modify to only reference 6 variables and the new table name).
create table data_table
(
owner number,
account_year number,
account_name varchar2(100),
period_1 number,
period_2 number,
period_3 number,
period_4 number,
period_5 number,
period_6 number
);
insert into data_table
select 9640, 2018 ,'something 1', 34 , 444 , 982 , 55 , 42 , 65 from dual union all
select 9640, 2018 ,'something 2', 333 , 65 , 666 , 78 , 44 , 55 from dual union all
select 9640, 2018 ,'something 3', 6565 , 783 , 32 , 12 , 46 , 667 from dual;
commit;
MODEL query:
select OWNER, PERIOD_1, PERIOD_2, PERIOD_3, PERIOD_4, PERIOD_5, PERIOD_6, ACCOUNT_YEAR, ACCOUNT_NAME
from DATA_TABLE
where OWNER IN ('9640') and PERIOD_1 is not null
MODEL ignore nav
Return UPDATED ROWS
PARTITION BY (OWNER, ACCOUNT_NAME)
DIMENSION BY (ACCOUNT_YEAR)
MEASURES (PERIOD_1,PERIOD_2, PERIOD_3, PERIOD_4, PERIOD_5, PERIOD_6)
RULES
(
PERIOD_1[2021] = PERIOD_1[2018] * 1.05,
PERIOD_2[2021] = PERIOD_2[2018] * 1.05,
PERIOD_3[2021] = PERIOD_3[2018] * 1.05,
PERIOD_4[2021] = PERIOD_4[2018] * 1.05,
PERIOD_5[2021] = PERIOD_5[2018] * 1.05,
PERIOD_6[2021] = PERIOD_6[2018] * 1.05
)
ORDER BY ACCOUNT_YEAR, ACCOUNT_NAME asc;
Results:
OWNER PERIOD_1 PERIOD_2 PERIOD_3 PERIOD_4 PERIOD_5 PERIOD_6 ACCOUNT_YEAR ACCOUNT_NAME
----- -------- -------- -------- -------- -------- -------- ------------ ------------
9640 35.7 466.2 1031.1 57.75 44.1 68.25 2021 something 1
9640 349.65 68.25 699.3 81.9 46.2 57.75 2021 something 2
9640 6893.25 822.15 33.6 12.6 48.3 700.35 2021 something 3
UNPIVOT approach
This example uses static code to demonstrate the syntax, but this can also be made more dynamic if necessary, perhaps through PL/SQL that creates temporary tables.
create table unpivoted_data as
select *
from data_table
unpivot (quantity for period_code in (period_1 as 'P1', period_2 as 'P2', period_3 as 'P3', period_4 as 'P4', period_5 as 'P5', period_6 as 'P6'));
With unpivoted data, the MODEL clause because simpler. Instead of listing a rule for each period, simply partition by the PERIOD_CODE:
select *
from unpivoted_data
where OWNER IN ('9640')
and (OWNER, ACCOUNT_YEAR, ACCOUNT_NAME) in
(
select owner, account_year, account_name
from unpivoted_data
where period_code = 'P1'
and quantity is not null
)
MODEL ignore nav
Return UPDATED ROWS
PARTITION BY (OWNER, ACCOUNT_NAME, PERIOD_CODE)
DIMENSION BY (ACCOUNT_YEAR)
MEASURES (QUANTITY)
RULES
(
QUANTITY[2021] = QUANTITY[2018] * 1.05
)
ORDER BY ACCOUNT_YEAR, ACCOUNT_NAME, PERIOD_CODE;
Results:
OWNER ACCOUNT_YEAR ACCOUNT_NAME PERIOD_CODE QUANTITY
----- ------------ ------------ ----------- --------
9640 2018 something 1 P1 34
9640 2018 something 1 P2 444
9640 2018 something 1 P3 982
...
Dynamic SQL in SQL
If you really need to do this all in one query, my open source package Method4 can help. Once the package is
installed, you call it by passing in a query that will generate the query you want to run.
This query returns the same results as the previous MODEL query, but will automatically adjust based on the columns in the table.
select * from table(method4.dynamic_query(
q'[
--Generate the MODEL query.
select
replace(replace(q'<
select OWNER, #PERIOD_COLUMN_LIST#, ACCOUNT_YEAR, ACCOUNT_NAME
from DATA_TABLE
where OWNER IN ('9640') and PERIOD_1 is not null
MODEL ignore nav
Return UPDATED ROWS
PARTITION BY (OWNER, ACCOUNT_NAME)
DIMENSION BY (ACCOUNT_YEAR)
MEASURES (#PERIOD_COLUMN_LIST#)
RULES
(
#RULES#
)
ORDER BY ACCOUNT_YEAR, ACCOUNT_NAME asc
>', '#PERIOD_COLUMN_LIST#', period_column_list)
, '#RULES#', rules) sql_statement
from
(
--List of columns.
select
listagg(column_name, ', ') within group (order by column_id) period_column_list,
listagg(column_name||'[2021] = '||column_name||'[2018] * 1.05', ','||chr(10)) within group (order by column_id) rules
from user_tab_columns
where table_name = 'DATA_TABLE'
and column_name like 'PERIOD%'
)
]'
));
Don't.
You can get an idea of the underlying obstruction if you understand the PARSE, BIND, EXECUTE flow of SQL as demonstrated by the DBMS_SQL package
https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_SQL.html#GUID-BF7B8D70-6A09-4E04-A216-F8952C347BAF
A cursor is opened and an SQL statement is parsed once. After being parsed, a DESCRIBE_COLUMNS can be called which tells you definitively what the columns will be returned by the execution of that SQL statement. From that point you can do multiple BIND and EXECUTE executions, putting different values for variables into the same statement and re-running. Each EXECUTE may be followed up by one of more FETCHes. None of the bind, execute or fetch can affect what columns are returned (either in number of columns, name, order or datatype).
The only way to change the columns returned is to parse a different SQL statement.
Depending on what you want at the end, you might be able to use a complex datatype (such as XML or JSON) to return data with different internal structures from the same statement (or even in different rows returned by the same statement),
Related
into below result
Could not think of a solution here
If that id column was a metric (number) on which you can aggregate, then you could do something like this with the PIVOT statement:
WITH in_table AS (
SELECT 1 AS sku_id, 0.5 AS id
UNION ALL
SELECT 1 AS sku_id, 0.6 AS id
UNION ALL
SELECT 1 AS sku_id, 0.7 AS id
UNION ALL
SELECT 2 AS sku_id, 0.3 AS id
),
out_table AS (
SELECT * FROM
(SELECT sku_id, id FROM in_table)
PIVOT(SUM(id) AS sum_id FOR sku_id IN (1, 2, 3))
)
-- to check the input data as a tabular output
-- SELECT *
-- FROM in_table;
-- to check the output data as a tabular output
SELECT *
FROM out_table;
input data:
output data:
Having that id field with categorical values, I am not sure you could achieve a similar outcome. If you try to put just id in that PIVOT, then you could see this error:
PIVOT expression must be an aggregate function call at [34:9]
Maybe there are other ways to achieve exactly what you ask. However, I hope the sample SQL above helps in putting you towards the right direction.
I am trying to summarize some massive tables in a way that can help me investigate further for some data issues. There are hundreds of 1000s of rows, and roughly 80+ columns of data in most of these tables.
From each table, I have already performed queries to throw out any columns that have all nulls or 1 value only. I've done this for 2 reasons.... for my purposes, a single value or nulls in the columns are not interesting to me and provide little information about the data; additionally, the next step is that I want to query each of the remaining columns and return up to 30 distinct values in each column (if the column has more than 30 distinct values, we show the 1st 30 distinct values).
Here is the general format of output I wish to create:
Column_name(total_num_distinct): distinct_val1(val1_count), distinct_val2(val2_couunt), ... distinct_val30(val30_count)
Assuming my data fields are integers, floats, and varchar2 data types, this is the SQL I was trying to use to generate that output:
declare
begin
for rw in (
select column_name colnm, num_distinct numd
from all_tab_columns
where
owner = 'scz'
and table_name like 'tbl1'
and num_distinct > 1
order by num_distinct desc
) loop
dbms_output.put(rw.colnm || '(' || numd || '): ');
for rw2 in (
select dis_val, cnt from (
select rw.colnm dis_val, count(*) cnt
from tbl1
group by rw.colnm
order by 2 desc
) where rownum <= 30
) loop
dbms_output.put(rw2.dis_val || '(' || rw2.cnt || '), ');
end loop;
dbms_output.put_line(' ');
end loop;
end;
I get the output I expect from the 1st loop, but the 2nd loop that is supposed to output examples of the unique values in each column, coupled with the frequency of their occurrence for the 30 values with the highest frequency of occurrence, appears to not be working as I intended. Instead of seeing unique values along with the number of times that value occurs in the field, I get the column names and count of total records in that table.
If the 1st loop suggests the first 4 columns in 'tbl1' with more than 1 distinct value are the following:
| colnm | numd |
|----------------|
| Col1 | 2 |
| Col3 | 4 |
| Col7 | 17 |
| Col12 | 30 |
... then the full output of 1st and 2nd loop together looks something like the following from my SQL:
Col1(2): Col1(tbl1_tot_rec_count), Col3(tbl1_tot_rec_count)
Col3(4): Col1(tbl1_tot_rec_count), Col3(tbl1_tot_rec_count), Col7(tbl1_tot_rec_count), Col12(tbl1_tot_rec_count)
Col7(17): Col1(tbl1_tot_rec_count), Col3(tbl1_tot_rec_count), Col7(tbl1_tot_rec_count), Col12(tbl1_tot_rec_count), .... , ColX(tbl1_tot_rec_count)
Col12(30): Col1(tbl1_tot_rec_count), Col3(tbl1_tot_rec_count), Col7(tbl1_tot_rec_count), Col12(tbl1_tot_rec_count), .... , ColX(tbl1_tot_rec_count)
The output looks cleaner when real data is output, each table outputting somewhere between 20-50 lines of output (i.e. columns with more than 2 values), and listing 30 unique values for each field (with their counts) only requires a little bit of scrolling, but isn't impractical. Just to give you an idea with fake values, the output would look more like this with real data if it was working correctly (but fake in my example):
Col1(2): DisVal1(874,283), DisVal2(34,578),
Col3(4): DisVal1(534,223), DisVal2(74,283), DisVal3(13,923), null(2348)
Col7(17): DisVal1(54,223), DisVal2(14,633), DisVal3(13,083), DisVal4(12,534), DisVal5(9,876), DisVal6(8,765), DisVal7(7654), DisVal8(6543), DisVal9(5432), ...., ...., ...., ...., ...., ...., ...., DisVal17(431)
I am not an Oracle or SQL guru, so I might not be approaching this problem in the easiest, most efficient way. While I do appreciate any better ways to approach this problem, I also want to learn why the SQL code above is not giving me the output I expect. My goal is trying to quickly run a single query that can tell me which columns have interesting data I might what to examine further in that table. I have probably 20 tables I need to examine that are all of similar dimensions and so very difficult to examine comprehensively. Being able to reduce these tables in this way to know what possible combinations of values may exist across the various fields in each of these tables would be very helpful in further queries to deep dive into the intricacies of the data.
It's because the select rw.colnm dis_val, count(*) cnt from tbl1 group by rw.colnm order by 2 desc is not doing at all what you think, and what you think to be done can't be done without dynamic SQL. What it does is in fact select 'a_column_of_tabl1' dis_val, count(*) cnt from tbl1 group by 'a_column_of_tabl1' order by 2 desc and what you need to do is execute dynamically the SQL: 'select ' || rw.colnm || ' dis_val, count(*) cnt from tbl1 group by ' || rw.colnm || ' order by 2 desc'.
Here is the (beta) query you can use to get the SQL to execute:
(I tested with user_* VIEWs instead of ALL_* to avoid getting to many results here...)
select utc.table_name, utc.column_name,
REPLACE( REPLACE(q'~select col_name, col_value, col_counter
from (
SELECT col_name, col_value, col_counter,
ROW_NUMBER() OVER(ORDER BY col_counter DESC) AS rn
FROM (
select '{col_name}' AS col_name,
{col_name} AS col_value, COUNT(*) as col_counter
from {table_name}
group by {col_name}
)
)
WHERE rn <= 30~', '{col_name}', utc.column_name), '{table_name}', utc.table_name) AS sql
from user_tab_columns utc
where not exists(
select 1
from user_indexes ui
join user_ind_columns uic on uic.table_name = ui.table_name
and uic.index_name = ui.index_name
where
ui.table_name = utc.table_name
and exists (
select 1 from user_ind_columns t where t.table_name = ui.table_name
and t.index_name = uic.index_name and t.column_name = utc.column_name
)
group by ui.table_name, ui.index_name having( count(*) = 1 )
)
and not exists(
SELECT 1
FROM user_constraints uc
JOIN user_cons_columns ucc ON ucc.constraint_name = uc.constraint_name
WHERE constraint_type IN (
'P', -- primary key
'U' -- unique
)
AND uc.table_name = utc.table_name AND ucc.column_name = utc.column_name
)
;
I have a table with three columns with an ID, a therapeutic class, and then a generic name. A therapeutic class can be mapped to multiple generic names.
ID therapeutic_class generic_name
1 YG4 insulin
1 CJ6 maleate
1 MG9 glargine
2 C4C diaoxy
2 KR3 supplies
3 YG4 insuilin
3 CJ6 maleate
3 MG9 glargine
I need to first look at the individual combinations of therapeutic class and generic name and then want to count how many patients have the same combination. I want my output to have three columns: one being the combo of generic names, the combo of therapeutic classes and the count of the number of patients with the combination like this:
Count Combination_generic combination_therapeutic
2 insulin, maleate, glargine YG4, CJ6, MG9
1 supplies, diaoxy C4C, KR3
One way to match patients by the sets of pairs (therapeutic_class, generic_name) is to create the comma-separated strings in your desired output, and to group by them and count. To do this right, you need a way to identify the pairs. See my Comment under the original question and my Comments to Gordon's Answer to understand some of the issues.
I do this identification in some preliminary work in the solution below. As I mentioned in my Comment, it would be better if the pairs and unique ID's existed already in your data model; I create them on the fly.
Important note: This assumes the comma-separated lists don't become too long. If you exceed 4000 characters (or approx. 32000 characters in Oracle 12, with certain options turned on), you CAN aggregate the strings into CLOBs, but you CAN'T GROUP BY CLOBs (in general, not just in this case), so this approach will fail. A more robust approach is to match the sets of pairs, not some aggregation of them. The solution is more complicated, I will not cover it unless it is needed in your problem.
with
-- Begin simulated data (not part of the solution)
test_data ( id, therapeutic_class, generic_name ) as (
select 1, 'GY6', 'insulin' from dual union all
select 1, 'MH4', 'maleate' from dual union all
select 1, 'KJ*', 'glargine' from dual union all
select 2, 'GY6', 'supplies' from dual union all
select 2, 'C4C', 'diaoxy' from dual union all
select 3, 'GY6', 'insulin' from dual union all
select 3, 'MH4', 'maleate' from dual union all
select 3, 'KJ*', 'glargine' from dual
),
-- End of simulated data (for testing purposes only).
-- SQL query solution continues BELOW THIS LINE
valid_pairs ( pair_id, therapeutic_class, generic_name ) as (
select rownum, therapeutic_class, generic_name
from (
select distinct therapeutic_class, generic_name
from test_data
)
),
first_agg ( id, tc_list, gn_list ) as (
select t.id,
listagg(p.therapeutic_class, ',') within group (order by p.pair_id),
listagg(p.generic_name , ',') within group (order by p.pair_id)
from test_data t join valid_pairs p
on t.therapeutic_class = p.therapeutic_class
and t.generic_name = p.generic_name
group by t.id
)
select count(*) as cnt, tc_list, gn_list
from first_agg
group by tc_list, gn_list
;
Output:
CNT TC_LIST GN_LIST
--- ------------------ ------------------------------
1 GY6,C4C supplies,diaoxy
2 GY6,KJ*,MH4 insulin,glargine,maleate
You are looking for listagg() and then another aggregation. I think:
select therapeutics, generics, count(*)
from (select id, listagg(therapeutic_class, ', ') within group (order by therapeutic_class) as therapeutics,
listagg(generic_name, ', ') within group (order by generic_name) as generics
from t
group by id
) t
group by therapeutics, generics;
How do I add a row to the end of this SELECT so I can see the total of the grouped rows? (I need the total for 'money' and 'requests':
SELECT
organizations.name || ' - ' || section.name as Section,
SUM(requests.money) as money,
COUNT(*) as requests
FROM
schema.organizations
-- INNER JOINs omitted --
WHERE
-- omitted --
GROUP BY
-- omitted --
ORDER BY
-- omitted --
Running the above produces:
|*Section* | *Money* | *Requests*|
|-----------|---------|-----------|
|BMO - HR |564 |10 |
|BMO - ITB |14707 |407 |
|BMO - test |15 |7 |
Now what I want is to add a total to the end of that which would display:
|BMO - Total|15286 |424 |
I have tried a few things, and ended up by trying to wrap the select in a WITH statement and failing:
WITH w as (
--SELECT statement from above--
)
SELECT * FROM w UNION ALL
SELECT 'Total', money, requests from w
This produces weird results (I'm getting four total rows - when there should be just one.
You can achieve this by using a UNION query. In the query below, I add an artificial sortorder column and wrap the union query in an outer query so that the sum line appears at the bottom.
[I'm assuming you'll be adding your joins and group by clauses...]
SELECT section, money, requests FROM -- outer select, to get the sorting right.
( SELECT
organizations.name || ' - ' || section.name as Section,
SUM(requests.money) as money,
COUNT(*) as requests,
0 AS sortorder -- added a sortorder column
FROM
schema.organizations
INNER JOINs omitted --
WHERE
-- omitted --
GROUP BY
-- omitted --
-- ORDER BY is not used here
UNION
SELECT
'BMO - Total' as section,
SUM(requests.money) as money,
COUNT(*) as requests,
1 AS sortorder
FROM
schema.organizations
-- add inner joins and where clauses as before
) AS unionquery
ORDER BY sortorder -- could also add other columns to sort here
The rollup function in this https://stackoverflow.com/a/54913166/1666637 answer might be a convenient way to do this. More on this and related functions here: https://www.postgresql.org/docs/devel/queries-table-expressions.html#QUERIES-GROUPING-SETS
Something like this, untested code
WITH w as (
--SELECT statement from above--
)
SELECT * FROM w ROLLUP((money,requests))
Note the double parentheses, they are significant
Using SQL*Plus to generate a listing that is e-mailed to a customer, e.g.:
SET MARKUP HTML ON
SPOOL spool.html
SELECT order_number, entry_date, delivery_date, customer_order_number, order_totals_quantity, TRUNC(order_totals_sqm,2), order_totals_net_value FROM orders WHERE entry_date = SYSDATE;
How can I also create a row that shows the total of the listed order_totals fields and keep them in line with those fields?
i.e. if I did a separate SELECT COUNT() for those fields it would list them under the first 3 when really they need to be underneath the original SELECT.
Update: This is what I'm looking for, if it's possible.
other columns ... order_totals_quantity | TRUNC(order_totals_sqm,2) | order_totals_net_value
--------------------------------------------------------------------------------------------
Total | Total | Total
Maybe...
Depends on what aggregate you're wanting and what denotes a unique record so as not to sum quantities incorrectly.
SELECT order_number, entry_date, delivery_date, customer_order_number,
sum(order_totals_quantity), sum(TRUNC(order_totals_sqm,2)), sum(order_totals_net_value)
FROM orders
WHERE entry_date = SYSDATE;
GROUP BY GROUPING SETS ((order_number, entry_date, delivery_date, customer_order_number),
())
Example found: http://www.oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets.php
Try this [assuming you are using oracle]:
SELECT order_number, entry_date, delivery_date, customer_order_number, order_totals_quantity, TRUNC(order_totals_sqm,2), order_totals_net_value,tot.a, tot.b
FROM orders, (select sum(order_totals_quantity) a, sum(order_totals_net_value ) b from orders WHERE entry_date = SYSDATE) tot
WHERE entry_date = SYSDATE;
As you are using SQL*Plus, there is an easier method using computes. This has the advantage of no extra SQL running on the server. Here is an example you can adapt for your query:
BREAK ON report
COMPUTE SUM LABEL total OF a ON report
SELECT 1 AS a FROM dual UNION ALL
SELECT 2 AS a FROM dual UNION ALL
SELECT 3 AS a FROM dual;
A
-------------
1
2
3
-------------
6
3 rows selected.
You can use other aggregates as well. Here is a link to the full documentation: COMPUTE.