Calculate mean, median and mode for each column in table and transpose so there is one row per column and the aggregates are columns (not using union) - sql

Let's say I have a table with numerical columns x,y & z. For data quality purposes I want to calculate the mean, median and mode for each column and present as rows, like so:
https://ibb.co/HFz1qTw
The end goal is to create an up-to-date table like the one in sys.all_columns (I'm not admin)
Is there an elegant way to do this (preferably dynamically so I don't have to enter every column name, while not using UNION) ?

Feasible but complex.
Here are the steps :
1 : Generates dynamically the SQL to compute mean, median... for the columns you want
2 : Run the query as dynamic cursor.
If you need it to like really easy to use at the end, you can use a pipelined function :
3 : So basically create a function that takes as an argument the table name and will be used like :
select * from table(MyPipelinedFunction(MyTableName));
4 : And you will easily get the result
Here are some tips on each step :
Step 1 : generate SQL dynamically
SQL := 'Select average(#FIELD1#), mean or median(#FIELD2#)... from #TABLE#';
SQL := replace(SQL,'#FIELD1#','Whatever');
SQL := replace(SQL,'#TABLE#','Whatever');
Step 2 : Run the query as dynamic cursor.
Declare
RefCursor SYS_REFCURSOR;
Mean number;
Median nuber;
...
BEGIN
OPEN RefCursor FOR SQL;
FETCH RefCursor INTO Mean, Median...;
CLOSE RefCursor;
Step 3 : pipelined function
TYPE Statistics IS RECORD(Mean NUMBER, MEdian Number, ...);
TYPE StatisticsTable IS TABLE OF Statistics;
FUNCTION ComputeStatistics(TableName in varchar2) RETURN Statistics PIPELINED IS
S Statistics;
BEGIN
-- Get the table field names in case they are different using a static cursor
SELECT * FROM USER_TAB_COLUMNS where table_name = upper(TableName);
-- Use the results to build the SQL query and run the dynamic cursor as in Step 1 & 2
-- After this step, results are stored in the variables Mean, Median...
-- Output result
S := Statistics(Mean, Median, ...);
PIPE ROW (S);
RETURN;
END ComputeStatistics;
Step 4 : Get the result
select * from table(ComputeStatistics('WhateverTable');

I would just unpivot and aggregate:
select c.col,
min(c.val), max(c.val), avg(c.val), median(c.val), stats_mode(c.val)
from t cross join lateral
(select 'x' as col, x as val from dual union all
select 'y', y from dual union all
select 'z', z from dual
) c
group by c.col;
You can turn this into dynamic SQL using PL/SQL. In practice, though, I would just query for the column names and generate the SQL directly:
select replace('
select [col] as col,
min(t.[col]), max(t.[col]), avg(t.[col]), median(t.[col]), stats_mode(t.[col])
from t t union all
', '[col]', column_name)
from sys.all_tab_columns c join
sys.all_tables t
on c.owner = t.owner and
c.table_name = t.table_name;
You can then just remove the last union all and run the code.
Note: This doesn't use listagg() because of limits on string length. But you can combine this logic into a single string if you don't have too many columns.

Related

Oracle function with select all from tables

SELECT DISTINCT L.* FROM LABALES L , MATCHES M
WHERE M.LIST LIKE '%ENG'
ORDER BY L.ID
I need to create function with this select, I tried this but it doesn't work.
CREATE OR REPLACE FUNCTION getSoccerLists
RETURN varchar2 IS
list varchar2(2000);
BEGIN
SELECT DISTINCT L.* FROM LABALES L , MATCHES M
WHERE M.LIST LIKE '%ENG'
ORDER BY L.ID
return list;
END;
How will I create function that returns all from table L.
Thanks
You may use implicit result using DBMS_SQL.RETURN_RESULT(Oracle12c and above) in a procedure using a cursor to your query.
CREATE OR REPLACE PROCEDURE getSoccerLists
AS
x SYS_REFCURSOR;
BEGIN
OPEN x FOR SELECT DISTINCT L.* FROM LABALES L
JOIN MATCHES M ON ( 1=1 ) -- join condition
WHERE M.LIST LIKE '%ENG'
ORDER BY L.ID;
DBMS_SQL.RETURN_RESULT(x);
END;
/
then simply call the procedure
EXEC getSoccerLists;
For lower versions(Oracle 11g) , you may use a print command to display the cursor's o/p passing ref cursor as out parameter.
CREATE OR REPLACE PROCEDURE getSoccerLists (x OUT SYS_REFCURSOR)
AS
BEGIN
OPEN x FOR SELECT DISTINCT L.* FROM LABALES L
JOIN MATCHES M ON ( 1=1 ) -- join condition
WHERE M.LIST LIKE '%ENG'
ORDER BY L.ID;
END;
/
Then, in SQL* Plus or running as script in SQL developer and Toad, you may get the results using this.
VARIABLE r REFCURSOR;
EXEC getSoccerLists (:r);
PRINT r;
Another option is to use TABLE function by defining a collection of the record type of the result within a package.
Refer Create an Oracle function that returns a table
I guess this questions is a repetition of the your previously asked question, where you wanted to get all the columns of tables but into separate column. I already answered in stating this you cannot do if you call your function via a SELECT statement. If you call your function in a Anoymous block you can display it in separate columns.
Here Oracle function returning all columns from tables
Alternatively, you can get the results separated by a comma(,) or pipe (|) as below:
CREATE OR REPLACE
FUNCTION getSoccerLists
RETURN VARCHAR2
IS
list VARCHAR2(2000);
BEGIN
SELECT col1
||','
||col2
||','
||col2
INTO LIST
FROM SOCCER_PREMATCH_LISTS L ,
SOCCER_PREMATCH_MATCHES M
WHERE M.LIST LIKE '%' || (L.SUB_LIST) || '%'
AND (TO_TIMESTAMP((M.M_DATE || ' ' || M.M_TIME), 'DD.MM.YYYY HH24:MI') >
(SELECT SYSTIMESTAMP AT TIME ZONE 'CET' FROM DUAL
))
ORDER BY L.ID");
Return list;
End;
Note here if the column size increased 2000 chars then again you will lose the data.
Edit:
From your comments
I want it to return a table set of results.
You then need to create a table of varchar and then return it from the function. See below:
CREATE TYPE var IS TABLE OF VARCHAR2(2000);
/
CREATE OR REPLACE
FUNCTION getSoccerLists
RETURN var
IS
--Initialization
list VAR :=var();
BEGIN
SELECT NSO ||',' ||NAME BULK COLLECT INTO LIST FROM TEST;
RETURN list;
END;
Execution:
select * from table(getSoccerLists);
Note: Here in the function i have used a table called test and its column. You replace your table with its columnname.
Edit 2:
--Create a object with columns same as your select statement
CREATE TYPE v_var IS OBJECT
(
col1 NUMBER,
col2 VARCHAR2(10)
)
/
--Create a table of your object
CREATE OR REPLACE TYPE var IS TABLE OF v_var;
/
CREATE OR REPLACE FUNCTION getSoccerLists
RETURN var
IS
--Initialization
list VAR :=var();
BEGIN
--You above object should have same columns with same data type as you are selecting here
SELECT v_var( NSO ,NAME) BULK COLLECT INTO LIST FROM TEST;
RETURN list;
END;
Execution:
select * from table(getSoccerLists);
This is not an answer on how to build a function for this, as I'd recommend to make this a view instead:
CREATE OR REPLACE VIEW view_soccer_list AS
SELECT *
FROM soccer_prematch_lists l
WHERE EXISTS
(
SELECT *
FROM soccer_prematch_matches m
WHERE m.list LIKE '%' || (l.sub_list) || '%'
AND TO_TIMESTAMP((m.m_date || ' ' || m.m_time), 'DD.MM.YYYY HH24:MI') >
(SELECT SYSTIMESTAMP AT TIME ZONE 'CET' FROM DUAL)
);
Then call it in a query:
SELECT * FROM view_soccer_list ORDER BY id;
(It makes no sense to put an ORDER BY clause in a view, because you access the view like a table, and table data is considered unordered, so you could not rely on that order. The same is true for a pipelined function youd access with FROM TABLE (getSoccerLists). Always put the ORDER BY clause in your final queries instead.)

Nested SELECT statement in FROM clause

I want to get data from table which name is keeping in another table. Trying to get this as described below leads to getting result from nested SELECT only
select * from (select value from ex_scheme.ex_tab where name = 'ex_name.current_table_name')
I mean, I've got equivalent result as from just
select value from ex_scheme.ex_tab where name = 'ex_name.current_table_name'
query.
UPDATED
Ok, lets double-check if I was correctly understood.
I have to see one table data (lets call this table "table1"). I need to know this table name. And I know where its name is keeping. It is in another table (call it "names_table") in column "name" (row with column value = 'table1'). And I can get it by query
select name from names_table where value = 'table1'
If you know in advance the column and its type, you can build some dynamic SQL to dynamically query a table or another.
For example, say you have tables like the following:
create table table1(col) as (select 1 from dual);
create table table2(col) as (select 2 from dual);
create table tab_of_tabs (tab_name) as (select 'TABLE1' from dual);
You can use dynamic SQL to build a query that scans a table whose name is the result of a query:
SQL> declare
2 vSQL varchar2(1000);
3 vResult number;
4 begin
5 select 'select sum(col) from ' || tab_name -- build the query
6 into vSQL
7 from tab_of_tabs;
8 --
9 execute immediate vSQL into vResult; -- run the query
10 --
11 dbms_output.put_line('Result: ' || vResult);
12 end;
13 /
Result: 1
PL/SQL procedure successfully completed.
SQL>
If I understand correctly, you could use a nested query in a where clause. For example,
select * from table1 where table1.name in (select name from table2);
This assumes there's a column "name" in table1. The result of this query should return the rows in table1 that are in table2.
try giving alias
select n.* from (select value from ex_scheme.ex_tab where name = 'ex_name.current_table_name') n;
Update:
It is in another table (call it "names_table") in column "name" (row
with column value = 'table1').
this query will work
select n.* from (select name from ex_scheme.ex_tab where name = 'ex_name.current_table_name') n;
sub query fetches name of table from another table .

How to store multiple rows in a variable in pl/sql function?

I'm writing a pl/sql function. I need to select multiple rows from select statement:
SELECT pel.ceid
FROM pa_exception_list pel
WHERE trunc(pel.creation_date) >= trunc(SYSDATE-7)
if i use:
SELECT pel.ceid
INTO v_ceid
it only stores one value, but i need to store all values that this select returns. Given that this is a function i can't just use simple select because i get error, "INTO - is expected."
You can use a record type to do that. The below example should work for you
DECLARE
TYPE v_array_type IS VARRAY (10) OF NUMBER;
var v_array_type;
BEGIN
SELECT x
BULK COLLECT INTO
var
FROM (
SELECT 1 x
FROM dual
UNION
SELECT 2 x
FROM dual
UNION
SELECT 3 x
FROM dual
);
FOR I IN 1..3 LOOP
dbms_output.put_line(var(I));
END LOOP;
END;
So in your case, it would be something like
select pel.ceid
BULK COLLECT INTO <variable which you create>
from pa_exception_list
where trunc(pel.creation_Date) >= trunc(sysdate-7);
If you really need to store multiple rows, check BULK COLLECT INTO statement and examples. But maybe FOR cursor LOOP and row-by-row processing would be better decision.
You may store all in a rowtype parameter and show whichever column you want to show( assuming ceid is your primary key column, col1 & 2 are some other columns of your table ) :
SQL> set serveroutput on;
SQL> declare
l_exp pa_exception_list%rowtype;
begin
for c in ( select *
from pa_exception_list pel
where trunc(pel.creation_date) >= trunc(SYSDATE-7)
) -- to select multiple rows
loop
select *
into l_exp
from pa_exception_list
where ceid = c.ceid; -- to render only one row( ceid is primary key )
dbms_output.put_line(l_exp.ceid||' - '||l_exp.col1||' - '||l_exp.col2); -- to show the results
end loop;
end;
/
SET SERVEROUTPUT ON
BEGIN
FOR rec IN (
--an implicit cursor is created here
SELECT pel.ceid AS ceid
FROM pa_exception_list pel
WHERE trunc(pel.creation_date) >= trunc(SYSDATE-7)
)
LOOP
dbms_output.put_line(rec.ceid);
END LOOP;
END;
/
Notes from here:
In this case, the cursor FOR LOOP declares, opens, fetches from, and
closes an implicit cursor. However, the implicit cursor is internal;
therefore, you cannot reference it.
Note that Oracle Database automatically optimizes a cursor FOR LOOP to
work similarly to a BULK COLLECT query. Although your code looks as if
it fetched one row at a time, Oracle Database fetches multiple rows at
a time and allows you to process each row individually.

Selecting static values to union into another query

Essentially, my problem is that I need to run a query in Oracle that unions a static list of values ('Static' meaning it's obtained from somewhere else that I cannot get from the database, but is actually an arbitrary list of values I plug into the query) with a dynamic list of values returned from a query.
So, my initial query looks like:
select * from (select ('18776') as instanceid from dual) union (<more complex query>)
I think, hooray! And then try to do it with a longer list of static values. Turns out, I get 'Missing Right Parenthesis' if I try to run:
select ('18776','18775') as instanceid from dual
So, my basic issue is how can I integrate a list of static values into this union?
NOTE: This is a simplified example of the problem. The actual list is generated from an API before I generate a query, and so this list of "static" values is unpredictably and arbitrarily large. I'm not dealing with just 2 static values, it is an arbitrary list.
select '18776' as instanceid from dual union all
select '18775' as instanceid from dual
or
select column_value from table(sys.odcivarchar2list('18776', '18775'))
or some sort of hierarchical query that could take your comma separated-string and split it into a set of varchars.
Union these to your initial query.
update: "I'm not dealing with just 2 static values, it is an arbitrary list."
Still can pass to a query as a collection (below is just one of many possible approaches)
23:15:36 LKU#sandbox> ed
Wrote file S:\spool\sandbox\BUFFER_LKU_39.sql
1 declare
2 cnt int := 10;
3 coll sys.odcivarchar2list := sys.odcivarchar2list();
4 begin
5 coll.extend(cnt);
6 for i in 1 .. cnt loop
7 coll(i) := dbms_random.string('l', i);
8 end loop;
9 open :result for 'select * from table(:c)' using coll;
10* end;
23:37:03 11 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.50
23:37:04 LKU#sandbox> print result
COLUMN_VALUE
-------------------------------------------------------------
g
kd
qdv
soth
rvwnq
uyfhbq
xxvxvtw
eprralmd
edbcajvfq
ewveyljsjn
10 rows selected.
Elapsed: 00:00:00.01
I think you want to break it into two subqueries:
select *
from ((select '18776' as instanceid from dual)
union
(select '18775' as instanceid from dual)
union
(<more complex query>)
) t;
Note that union all performs better than union. If you know there are no duplicates (or duplicates don't matter) then use union all instead.
If you have the ability/permission to create a table type, you can do this:
CREATE OR REPLACE
TYPE TYP_NUMBER_TABLE AS TABLE OF NUMBER(11);
And then you can use the TABLE function to select from a instance of that type that you initialize on the fly in your SQL:
SELECT COLUMN_VALUE FROM TABLE(TYP_NUMBER_TABLE(1, 2, 3));
Result:
COLUMN_VALUE
------------
1
2
3

Oracle and SQLServer function evaluation in queries

Let's say I have a function call on a select or where clause in Oracle like this:
select a, b, c, dbms_crypto.hash(utl_raw.cast_to_raw('HELLO'),3)
from my_table
A similar example can be constructed for MS SQLServer.
What's the expected behavior in each case?
Is the HASH function going to be called once for each row in the table, or DBMS will be smart enough to call the function just once, since it's a function with constant parameters and no side-effects?
Thanks a lot.
The answer for Oracle is it depends. The function will be called for every row selected UNLESS the Function is marked 'Deterministic' in which case it will only be called once.
CREATE OR REPLACE PACKAGE TestCallCount AS
FUNCTION StringLen(SrcStr VARCHAR) RETURN INTEGER;
FUNCTION StringLen2(SrcStr VARCHAR) RETURN INTEGER DETERMINISTIC;
FUNCTION GetCallCount RETURN INTEGER;
FUNCTION GetCallCount2 RETURN INTEGER;
END TestCallCount;
CREATE OR REPLACE PACKAGE BODY TestCallCount AS
TotalFunctionCalls INTEGER := 0;
TotalFunctionCalls2 INTEGER := 0;
FUNCTION StringLen(SrcStr VARCHAR) RETURN INTEGER AS
BEGIN
TotalFunctionCalls := TotalFunctionCalls + 1;
RETURN Length(SrcStr);
END;
FUNCTION GetCallCount RETURN INTEGER AS
BEGIN
RETURN TotalFunctionCalls;
END;
FUNCTION StringLen2(SrcStr VARCHAR) RETURN INTEGER DETERMINISTIC AS
BEGIN
TotalFunctionCalls2 := TotalFunctionCalls2 + 1;
RETURN Length(SrcStr);
END;
FUNCTION GetCallCount2 RETURN INTEGER AS
BEGIN
RETURN TotalFunctionCalls2;
END;
END TestCallCount;
SELECT a,TestCallCount.StringLen('foo') FROM(
SELECT 0 as a FROM dual
UNION
SELECT 1 as a FROM dual
UNION
SELECT 2 as a FROM dual
);
SELECT TestCallCount.GetCallCount() AS TotalFunctionCalls FROM dual;
Output:
A TESTCALLCOUNT.STRINGLEN('FOO')
---------------------- ------------------------------
0 3
1 3
2 3
3 rows selected
TOTALFUNCTIONCALLS
----------------------
3
1 rows selected
So the StringLen() function was called three times in the first case. Now when executing with StringLen2() which is denoted deterministic:
SELECT a,TestCallCount.StringLen2('foo') from(
select 0 as a from dual
union
select 1 as a from dual
union
select 2 as a from dual
);
SELECT TestCallCount.GetCallCount2() AS TotalFunctionCalls FROM dual;
Results:
A TESTCALLCOUNT.STRINGLEN2('FOO')
---------------------- -------------------------------
0 3
1 3
2 3
3 rows selected
TOTALFUNCTIONCALLS
----------------------
1
1 rows selected
So the StringLen2() function was only called once since it was marked deterministic.
For a function not marked deterministic, you can get around this by modifying your query as such:
select a, b, c, hashed
from my_table
cross join (
select dbms_crypto.hash(utl_raw.cast_to_raw('HELLO'),3) as hashed from dual
);
For SQL server, it will be evaluated for every single row.
You will be MUCH better off by running the function once and assigning to a variable and using the variable in the query.
short answer....it depends.
If the function is accessing data ORACLE does not know if it is going to be the same for each row, therefore, it needs to query for each. If, for example, your function is just a formatter that always returns the same value then you can turn on caching (marking it as Deterministic) which may allow for you to only do the function call once.
Something you may want to look into is ORACLE WITH subquery:
The WITH query_name clause lets you
assign a name to a subquery block. You
can then reference the subquery block
multiple places in the query by
specifying the query name. Oracle
optimizes the query by treating the
query name as either an inline view or
as a temporary table
I got the quoted text from here, which has plenty of examples.