Listing tables used in a SQL query

Listing tables used in a SQL query - sql

I have nearly 800 SQL scripts ranging medium to complex SQLs.
Database is Oracle.
I need list of tables used in each of these scripts. Each script contains numerous with clauses, joins and subqueries. Is there a easier way to achieve this?
Thanks
LN

first store all queries to 1 table -scripts- with the columns - script_name, script_text
then this could work:
select script_name, script_text, object_name
from scripts s
join dba_objects do
on 1 = 1
where 1 = 1
and instr(upper(s.script_text), upper(do.object_name)) > 0

You may try explain plan for each of the statements and then check the content of plan_table. But it:
doesn't count views (plan table shows base tables of the view) and scalar functions
should be adapted for dblinks
should be thoroughly tested with local PL/SQL declarations (with function) and functions in other schemas
has influence of query rewrite and baselines
Below is an example:
begin
for i in 1..10 loop
execute immediate
'create table t'
|| i || '( id int)';
end loop;
end;
/
create view v_test
as
select *
from t5
join t6
using(id)
create function f_tab
return sys.odcinumberlist
pipelined
as
begin
null;
end;
/
explain plan for
select *
from t1, t2, t3, t4,
/*View*/
v_test,
/*Pipelined function*/
table(f_tab())
select
object_owner,
object_name,
object_type
from plan_table
where
(object_owner, object_name) in (
select
f.owner,
f.object_name
from all_objects f
where f.object_type in (
'TABLE'
, 'VIEW'
)
) or (object_name, object_type) in (
select
f.object_name,
'PROCEDURE'
from user_objects f
where f.object_type in (
'FUNCTION'
, 'PROCEDURE'
)
)
order by 1
OBJECT_OWNER | OBJECT_NAME | OBJECT_TYPE
:-------------------------- | :---------- | :----------
FIDDLE_TQYMTNVUFUWHRWJEENKX | T1 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T2 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T3 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T4 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T5 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T6 | TABLE
null | F_TAB | PROCEDURE
db<>fiddle here
Anyway, I think that this sort of task should be addressed to SQL parser in any way instead of any string manipulation, because only parser exactly knows what objects were used

Related

PostgreSQL dynamic query: find most recent timestamp of several unrelated tables

I have a number of tables, and many of them have a timestamp column. I can get a list of every table with a timestamp column:
SELECT table_name
FROM information_schema.columns
WHERE table_schema='my_schema' AND column_name='timestamp';
--------------
table_name
--------------
apples
bananas
sharks
lemons
I can get the hightest timestamp in the sharks table as follows
SELECT MAX(timestamp) FROM sharks;
-------------------------
max
-------------------------
2021-11-24 00:00:00.000
I would like to get a table like
table_name | last_updated
-------------+-------------------------
apples | 2021-11-23 00:02:00.000
bananas | 2019-10-16 00:04:00.000
sharks | 2021-11-24 00:00:00.000
lemons | 1970-01-03 10:00:00.000
I'm suspecting this requires dynamic SQL, so I'm trying something like
SELECT (
EXECUTE 'SELECT MAX(timestamp) FROM my_schema.' || table_name
) FROM (
SELECT table_name
FROM information_schema.columns
WHERE table_schema='my_schema' AND column_name='timestamp'
);
But it seems like EXECUTE doesn't work in subqueries.
Performance is not particularly a concern, just producing the desired results.

Dynamic Queries cannot function outside of PL/pgSQL blocks, so you need to wrap your code in one.
I set up test tables similar to yours, with only the "timestamp" column shared between them:
drop table if exists public.sharks_70099803 cascade;
create table public.sharks_70099803
as select 1::int integer_column,
now()::timestamp as "timestamp";
drop table if exists public.bananas_70099803 cascade;
create table public.bananas_70099803
as select 'some text'::text text_column,
now()::timestamp as "timestamp";
Wrap a dynamic query in PL/pgSQL function. Inside I build a query to pull the max(timestamp) from each table with the column, then aggregate those into one query with a union all in between, that I later execute.
CREATE OR REPLACE FUNCTION public.test_70099803()
RETURNS SETOF RECORD
LANGUAGE 'plpgsql'
AS $BODY$
BEGIN
return query
execute (
select string_agg(select_per_table,' union all ')
from (
select 'select '''||
table_name||
''' as table_name, max(timestamp) from public.'||
table_name "select_per_table"
from information_schema.columns
where table_schema='public'
and column_name='timestamp'
) a
);
END
$BODY$;
select * from public.test_70099803() as t(table_name text, max_timestamp timestamp);
-- table_name | max_timestamp
--------------------+----------------------------
-- sharks_70099803 | 2021-11-24 17:12:03.24951
-- bananas_70099803 | 2021-11-24 17:12:03.253614
--(2 rows)
You can parametrise your function to be applicable to more groups of tables, or to have a predefined output table structure that'll let you just select * from test_70099803();

Is it the best way to optimize this SQL statement when using not in in sql query?

The table called TB_ORDER have 90 million records of data,but only have 500 records which STATE is neither B nor C.
SELECT
ORDER.ID,ORDER.STATE,ORDER.NAME
FROM
TB_ORDER ORDER
WHERE
ORDER.STATE NOT IN ('B','C') ;
My workmate writes the sql like this and it cost about 7 minites because of full table scan.So I try to change it like this.Its that OK?I have add index on state field. Is it still full table scan because the subquery sql result is very large((90000000-500)/90000000)?
SELECT
A.ID,A.NAME,A.STATE
FROM TB_ORDER A
WHERE
NOT EXISTS
(
SELECT 1 FROM TB_ORDER B WHERE A.ID=B.ID and B.STATE='B'
UNION ALL
SELECT 1 FROM TB_ORDER C WHERE A.ID=C.ID and C.STATE='C'
)

Do you really need the NOT IN ? You could work around that by using a function and then creating a function based index. Make sure you where clause matches the predicate exactly. Example:
-- table
create table t_large_table (id NUMBER GENERATED ALWAYS AS IDENTITY,state VARCHAR2(1));
-- some sample data
DECLARE
BEGIN
FOR i IN 1 .. 10 LOOP
INSERT INTO t_large_table (state) VALUES ('A');
INSERT INTO t_large_table (state) VALUES ('B');
END LOOP;
INSERT INTO t_large_table (state) VALUES ('C');
INSERT INTO t_large_table (state) VALUES ('D');
COMMIT;
END;
/
-- create index with function that has a bucket to put all states that are relevant to me. In this case everything that is not A or B
CREATE INDEX t_large_table_idx
ON t_large_table (CASE state WHEN 'A' THEN 'A' WHEN 'B' THEN 'B' ELSE 'X' END);
-- run a select with exactly same function as the index
SELECT *
FROM t_large_table
WHERE CASE state WHEN 'A' THEN 'A' WHEN 'B' THEN 'B' ELSE 'X' END = 'X';
-- check explain plan
-----------------------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T_LARGE_TABLE |
| 2 | INDEX RANGE SCAN | T_LARGE_TABLE_IDX |
-----------------------------------------------------------------

I'll make a suggestion, you may try.
Select o.ID,o.o,o.NAME
FROM TB_ORDER o
Inner Join
(
Select STATE from
(
Select STATE from TB_ORDER Group by ORDER
) Q Where STATE NOT IN ('B','C')
) QQ on QQ.STATE = o.STATE

snowflake, get a list of mismatching columns between two tables (SQL)

I have been doing some reseach but didn't find much. I need to compare two tables to get a list of which columns are in table 1, but not in table 2. I am using Snowflake.
Now, I've found this answer: postgresql - get a list of columns difference between 2 tables
The problem is that when I run the code I get this error:
SQL compilation error: invalid identifier TRANSIENT_STAGE_TABLE
The code works fine if I run it separately, so if I run:
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'your_schema' AND table_name = 'table2'
I actually get a list of column names, but when I chain it to the second expression, the above error is returned.
Any hint on what's going on?
Thank you

The query from the original post should work, maybe you're missing single quotes somewhere? See this example
create or replace table xxx1(i int, j int);
create or replace table xxx2(i int, k int);
-- Query from the original post
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'XXX1'
AND column_name NOT IN
(
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'XXX2'
);
-------------+
COLUMN_NAME |
-------------+
J |
-------------+
You can also write a slightly more complex query to see all columns not matching, from both tables:
with
s1 as (
select table_name, column_name
from information_schema.columns
where table_name = 'XXX1'),
s2 as (
select table_name, column_name
from information_schema.columns
where table_name = 'XXX2')
select * from s1 full outer join s2 on s1.column_name = s2.column_name;
------------+-------------+------------+-------------+
TABLE_NAME | COLUMN_NAME | TABLE_NAME | COLUMN_NAME |
------------+-------------+------------+-------------+
XXX1 | I | XXX2 | I |
XXX1 | J | [NULL] | [NULL] |
[NULL] | [NULL] | XXX2 | K |
------------+-------------+------------+-------------+
You can add WHERE s1.column_name IS NULL or s2.column_name IS NULL to find only missing columns of course.
You can also easily extend it to detect column type differences.

How to construct dynamic query over multiple tables

I'm not sure how to formulate this query. I think I need a subquery? Here's basically what I'm trying to do in a single query.
This query gives me the list of tables I need:
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'abc_dev_12345'
AND table_name like 'fact_%';
For the list of tables given, then I want to do a count from each table_name (each table_name has the same column info I need to query)
SELECT table_name,
count (domain_key) key_count,
domain_key,
form_created_datetime
FROM (List of tables above)
GROUP BY domain_key,
form_created_datetime;
Can I iterate through each table listed in the first query to do my count?
Do this in a single query?
So expected out would be similar to this:
table_name | key_count | domain_key | form_created_datetime
--------------------------------------------------------------
fact_1 1241 5 2015-09-22 01:47:36.136789
fact_2 32 9 2015-09-22 01:47:36.136789
Example data:
abc_dev_12345=> SELECT *
FROM information_schema.tables
where table_schema='abc_dev_own_12345'
and table_name='fact_1';
table_catalog | table_schema | table_name | table_type | self_referencing_column_name | reference_generation | user_defined_type_catalog | user_defined_type_schema | use
r_defined_type_name | is_insertable_into | is_typed | commit_action
---------------+-------------------+--------------------+------------+------------------------------+----------------------+---------------------------+--------------------------+----
--------------------+--------------------+----------+---------------
abc_dev_12345 | abc_dev_own_12345 | fact_1 | BASE TABLE | | | | |
| YES | NO |
(1 row)
abc_dev_12345=> SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'abc_dev_own_12345'
AND table_name = 'fact_1';
column_name
------------------------
email_date_key
email_time_key
customer_key
form_created_datetime
client_key
domain_key

Like Eelke and Craig Ringer noted, you need a dynamic query in a plpgsql function. The basic statement you want to apply to each table is:
SELECT <table_name>, count(domain_key) AS key_count, domain_key, form_created_datetime
FROM <table_name> GROUP BY 3, 4
and you want to UNION the lot together.
The most efficient way to do this is to first build a query as a text object from the information in information_schema.tables and then EXECUTE that query. There are many ways to build that query, but I particularly like the below dirty trick with string_agg():
CREATE FUNCTION table_domains()
RETURNS TABLE (table_name varchar, key_count bigint, domain_key integer, form_created_datetime timestamp)
AS $$
DECLARE
qry text;
BEGIN
-- format() builds query for individual table
-- string_agg() UNIONs queries from all tables into a single statement
SELECT string_agg(
format('SELECT %1$I, count(domain_key), domain_key, form_created_datetime
FROM %1$I GROUP BY 3, 4', table_name),
' UNION ') INTO qry
FROM information_schema.tables
WHERE table_schema = 'abc_dev_12345'
AND table_name LIKE 'fact_%';
-- Now EXECUTE the query
RETURN QUERY EXECUTE qry;
END;
$$ LANGUAGE plpgsql;
No need for loops or cursors so pretty efficient.
Use like you would any other table:
SELECT * FROM table_domains();

Dynamic Cross Tab Query in Oracle

i need to create a dynamic cross tab query where columns will not always be fixed in number so cant hard code using case when. i ve googled, it did find a blog about doing the same in SQL Server but i was wondering if there is any such article blog on doing the same in Oracle. Have not worked in SQL Server. Fol is the info about my problem.
the hard coded cross tab query i wrote
SELECT
LU_CITY.CITY_NAME as "City",
count(CASE WHEN emp.emp_category='Admin' THEN emp.emp_id END) As "Admins",
count(CASE WHEN emp.emp_category='Executive' THEN emp.emp_id END) As "Executive",
count(CASE WHEN emp.emp_category='Staff' THEN emp.emp_id END) As "Staff",
count(emp.emp_id) As "Total"
FROM emp, LU_CITY
where
LU_CITY.CITY_ID = EMP.CITY_ID(+)
group by
LU_CITY.CITY_NAME, LU_CITY.CITY_ID
order by
LU_CITY.CITY_ID
tables
emp (emp_id, emp_name, city_id, emp_category)
lu_city(city_id,city_name)
query result
------------------------------------------
City | Admins | Executive | Staff . . . .
------------------------------------------
A | 1 | 2 | 3
B | 0 | 0 | 4
. | . | . | .
.
.
The emp_category can be added by the user as per their need. the query should be such that it should generate all such categories dynamically.
Any guidance in this regard would be highly appreciated.
Thanks in Advance

You can use dynamic cursors to execute dynamic SQL compiled from a VARCHAR2 variable:
DECLARE
w_sql VARCHAR2 (4000);
cursor_ INTEGER;
v_f1 NUMBER (6);
v_f2 NUMBER (2);
v_some_value_2_filter_4 NUMBER (2);
rc INTEGER DEFAULT 0;
BEGIN
-- join as many tables as you need and construct your where clause
w_sql :='SELECT f1, f2 from TABLE1 t1, TABLE2 t2, ... WHERE t1.f1 =' || v_some_value_2_filter_4 ;
-- Open your cursor
cursor_ := DBMS_SQL.open_cursor;
DBMS_SQL.parse (cursor_, w_sql, 1);
DBMS_SQL.define_column (cursor_, 1, v_f1);
DBMS_SQL.define_column (cursor_, 2, v_f2);
-- execute your SQL
rc := DBMS_SQL.EXECUTE (cursor_);
WHILE DBMS_SQL.fetch_rows (cursor_) > 0
LOOP
-- get values from record columns
DBMS_SQL.COLUMN_VALUE (cursor_, 1, v_f1);
DBMS_SQL.COLUMN_VALUE (cursor_, 2, v_f2);
-- do what you need with v_f1 and v_f2 variables
END LOOP;
END;
Or you can use execute immediate, easier to implement if you just need to check a value or execute and insert/update/delete query
w_sql :='select f1 from table where f1 = :variable';
execute immediate w_sql into v_f1 using 'valor1'
Here more info about dynamic cursors:
http://docs.oracle.com/cd/B10500_01/appdev.920/a96590/adg09dyn.htm

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Listing tables used in a SQL query - sql

I have nearly 800 SQL scripts ranging medium to complex SQLs. Database is Oracle. I need list of tables used in each of these scripts. Each script contains numerous with clauses, joins and subqueries. Is there a easier way to achieve this? Thanks LN

first store all queries to 1 table -scripts- with the columns - script_name, script_text then this could work: select script_name, script_text, object_name from scripts s join dba_objects do on 1 = 1 where 1 = 1 and instr(upper(s.script_text), upper(do.object_name)) > 0

Related

PostgreSQL dynamic query: find most recent timestamp of several unrelated tables

Is it the best way to optimize this SQL statement when using not in in sql query?

snowflake, get a list of mismatching columns between two tables (SQL)

How to construct dynamic query over multiple tables

Dynamic Cross Tab Query in Oracle

Categories

Resources