How to construct dynamic query over multiple tables

How to construct dynamic query over multiple tables - sql

I'm not sure how to formulate this query. I think I need a subquery? Here's basically what I'm trying to do in a single query.
This query gives me the list of tables I need:
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'abc_dev_12345'
AND table_name like 'fact_%';
For the list of tables given, then I want to do a count from each table_name (each table_name has the same column info I need to query)
SELECT table_name,
count (domain_key) key_count,
domain_key,
form_created_datetime
FROM (List of tables above)
GROUP BY domain_key,
form_created_datetime;
Can I iterate through each table listed in the first query to do my count?
Do this in a single query?
So expected out would be similar to this:
table_name | key_count | domain_key | form_created_datetime
--------------------------------------------------------------
fact_1 1241 5 2015-09-22 01:47:36.136789
fact_2 32 9 2015-09-22 01:47:36.136789
Example data:
abc_dev_12345=> SELECT *
FROM information_schema.tables
where table_schema='abc_dev_own_12345'
and table_name='fact_1';
table_catalog | table_schema | table_name | table_type | self_referencing_column_name | reference_generation | user_defined_type_catalog | user_defined_type_schema | use
r_defined_type_name | is_insertable_into | is_typed | commit_action
---------------+-------------------+--------------------+------------+------------------------------+----------------------+---------------------------+--------------------------+----
--------------------+--------------------+----------+---------------
abc_dev_12345 | abc_dev_own_12345 | fact_1 | BASE TABLE | | | | |
| YES | NO |
(1 row)
abc_dev_12345=> SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'abc_dev_own_12345'
AND table_name = 'fact_1';
column_name
------------------------
email_date_key
email_time_key
customer_key
form_created_datetime
client_key
domain_key

Like Eelke and Craig Ringer noted, you need a dynamic query in a plpgsql function. The basic statement you want to apply to each table is:
SELECT <table_name>, count(domain_key) AS key_count, domain_key, form_created_datetime
FROM <table_name> GROUP BY 3, 4
and you want to UNION the lot together.
The most efficient way to do this is to first build a query as a text object from the information in information_schema.tables and then EXECUTE that query. There are many ways to build that query, but I particularly like the below dirty trick with string_agg():
CREATE FUNCTION table_domains()
RETURNS TABLE (table_name varchar, key_count bigint, domain_key integer, form_created_datetime timestamp)
AS $$
DECLARE
qry text;
BEGIN
-- format() builds query for individual table
-- string_agg() UNIONs queries from all tables into a single statement
SELECT string_agg(
format('SELECT %1$I, count(domain_key), domain_key, form_created_datetime
FROM %1$I GROUP BY 3, 4', table_name),
' UNION ') INTO qry
FROM information_schema.tables
WHERE table_schema = 'abc_dev_12345'
AND table_name LIKE 'fact_%';
-- Now EXECUTE the query
RETURN QUERY EXECUTE qry;
END;
$$ LANGUAGE plpgsql;
No need for loops or cursors so pretty efficient.
Use like you would any other table:
SELECT * FROM table_domains();

Related

Listing tables used in a SQL query

I have nearly 800 SQL scripts ranging medium to complex SQLs.
Database is Oracle.
I need list of tables used in each of these scripts. Each script contains numerous with clauses, joins and subqueries. Is there a easier way to achieve this?
Thanks
LN

first store all queries to 1 table -scripts- with the columns - script_name, script_text
then this could work:
select script_name, script_text, object_name
from scripts s
join dba_objects do
on 1 = 1
where 1 = 1
and instr(upper(s.script_text), upper(do.object_name)) > 0

You may try explain plan for each of the statements and then check the content of plan_table. But it:
doesn't count views (plan table shows base tables of the view) and scalar functions
should be adapted for dblinks
should be thoroughly tested with local PL/SQL declarations (with function) and functions in other schemas
has influence of query rewrite and baselines
Below is an example:
begin
for i in 1..10 loop
execute immediate
'create table t'
|| i || '( id int)';
end loop;
end;
/
create view v_test
as
select *
from t5
join t6
using(id)
create function f_tab
return sys.odcinumberlist
pipelined
as
begin
null;
end;
/
explain plan for
select *
from t1, t2, t3, t4,
/*View*/
v_test,
/*Pipelined function*/
table(f_tab())
select
object_owner,
object_name,
object_type
from plan_table
where
(object_owner, object_name) in (
select
f.owner,
f.object_name
from all_objects f
where f.object_type in (
'TABLE'
, 'VIEW'
)
) or (object_name, object_type) in (
select
f.object_name,
'PROCEDURE'
from user_objects f
where f.object_type in (
'FUNCTION'
, 'PROCEDURE'
)
)
order by 1
OBJECT_OWNER | OBJECT_NAME | OBJECT_TYPE
:-------------------------- | :---------- | :----------
FIDDLE_TQYMTNVUFUWHRWJEENKX | T1 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T2 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T3 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T4 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T5 | TABLE
FIDDLE_TQYMTNVUFUWHRWJEENKX | T6 | TABLE
null | F_TAB | PROCEDURE
db<>fiddle here
Anyway, I think that this sort of task should be addressed to SQL parser in any way instead of any string manipulation, because only parser exactly knows what objects were used

PostgreSQL dynamic query: find most recent timestamp of several unrelated tables

I have a number of tables, and many of them have a timestamp column. I can get a list of every table with a timestamp column:
SELECT table_name
FROM information_schema.columns
WHERE table_schema='my_schema' AND column_name='timestamp';
--------------
table_name
--------------
apples
bananas
sharks
lemons
I can get the hightest timestamp in the sharks table as follows
SELECT MAX(timestamp) FROM sharks;
-------------------------
max
-------------------------
2021-11-24 00:00:00.000
I would like to get a table like
table_name | last_updated
-------------+-------------------------
apples | 2021-11-23 00:02:00.000
bananas | 2019-10-16 00:04:00.000
sharks | 2021-11-24 00:00:00.000
lemons | 1970-01-03 10:00:00.000
I'm suspecting this requires dynamic SQL, so I'm trying something like
SELECT (
EXECUTE 'SELECT MAX(timestamp) FROM my_schema.' || table_name
) FROM (
SELECT table_name
FROM information_schema.columns
WHERE table_schema='my_schema' AND column_name='timestamp'
);
But it seems like EXECUTE doesn't work in subqueries.
Performance is not particularly a concern, just producing the desired results.

Dynamic Queries cannot function outside of PL/pgSQL blocks, so you need to wrap your code in one.
I set up test tables similar to yours, with only the "timestamp" column shared between them:
drop table if exists public.sharks_70099803 cascade;
create table public.sharks_70099803
as select 1::int integer_column,
now()::timestamp as "timestamp";
drop table if exists public.bananas_70099803 cascade;
create table public.bananas_70099803
as select 'some text'::text text_column,
now()::timestamp as "timestamp";
Wrap a dynamic query in PL/pgSQL function. Inside I build a query to pull the max(timestamp) from each table with the column, then aggregate those into one query with a union all in between, that I later execute.
CREATE OR REPLACE FUNCTION public.test_70099803()
RETURNS SETOF RECORD
LANGUAGE 'plpgsql'
AS $BODY$
BEGIN
return query
execute (
select string_agg(select_per_table,' union all ')
from (
select 'select '''||
table_name||
''' as table_name, max(timestamp) from public.'||
table_name "select_per_table"
from information_schema.columns
where table_schema='public'
and column_name='timestamp'
) a
);
END
$BODY$;
select * from public.test_70099803() as t(table_name text, max_timestamp timestamp);
-- table_name | max_timestamp
--------------------+----------------------------
-- sharks_70099803 | 2021-11-24 17:12:03.24951
-- bananas_70099803 | 2021-11-24 17:12:03.253614
--(2 rows)
You can parametrise your function to be applicable to more groups of tables, or to have a predefined output table structure that'll let you just select * from test_70099803();

Transpose SQL result that pulls column names and datatype

I am using Oracle SQL Developer. I want to run a SQL query that pulls column_name, data_type and nullable values from a certain table. I can accomplish this by running the following code:
select column_name, data_type, nullable
from all_tab_columns
where table_name = 'mytable'
order by column_id asc
This outputs results as such:
Column_Name | Data_Type | Nullable
-----------------------------------
Column 1 | VARCHAR2 | N
Column 2 | NUMBER | Y
Column 3 | DATE | N
In order to make this information useful to me, I need to transpose this data so that column_name is all one row (right now its all one column) and its corresponding data below it. It should look something like this.
Column 1 | Column 2 | Column 3
------------------------------
VARCHAR2 | NUMBER | DATE
N | Y | N
Does anyone know the best way to go about doing this? In Teradata, this was as easy as running a case command, but it doesn't seem to be the case (pun) here in Oracle. Any help is appreciated!

One method is to use listagg() to put everything into one string column:
select listagg(column_name || ',' || data_type || ',' || nullable, ';') within group (order by column_id asc)
from all_tab_columns
where table_name = 'mytable'

snowflake, get a list of mismatching columns between two tables (SQL)

I have been doing some reseach but didn't find much. I need to compare two tables to get a list of which columns are in table 1, but not in table 2. I am using Snowflake.
Now, I've found this answer: postgresql - get a list of columns difference between 2 tables
The problem is that when I run the code I get this error:
SQL compilation error: invalid identifier TRANSIENT_STAGE_TABLE
The code works fine if I run it separately, so if I run:
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'your_schema' AND table_name = 'table2'
I actually get a list of column names, but when I chain it to the second expression, the above error is returned.
Any hint on what's going on?
Thank you

The query from the original post should work, maybe you're missing single quotes somewhere? See this example
create or replace table xxx1(i int, j int);
create or replace table xxx2(i int, k int);
-- Query from the original post
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'XXX1'
AND column_name NOT IN
(
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'XXX2'
);
-------------+
COLUMN_NAME |
-------------+
J |
-------------+
You can also write a slightly more complex query to see all columns not matching, from both tables:
with
s1 as (
select table_name, column_name
from information_schema.columns
where table_name = 'XXX1'),
s2 as (
select table_name, column_name
from information_schema.columns
where table_name = 'XXX2')
select * from s1 full outer join s2 on s1.column_name = s2.column_name;
------------+-------------+------------+-------------+
TABLE_NAME | COLUMN_NAME | TABLE_NAME | COLUMN_NAME |
------------+-------------+------------+-------------+
XXX1 | I | XXX2 | I |
XXX1 | J | [NULL] | [NULL] |
[NULL] | [NULL] | XXX2 | K |
------------+-------------+------------+-------------+
You can add WHERE s1.column_name IS NULL or s2.column_name IS NULL to find only missing columns of course.
You can also easily extend it to detect column type differences.

How to select all possible values of columns from all tables?

SELECT POM.TABLE_NAME, POM.COLUMN_NAME
FROM ALL_TAB_COLUMNS POM
WHERE POM.COLUMN_NAME LIKE'%STATUS%'
I want to see all possible values in columns on the list(in one row if possible). How can i modify this select to do it?
i want soemthing like this
TABLE_NAME | COLUMN_NAME |VALUES
-----------| ----------- | -------
CAR | COLOR | RED,GREEN

You can use the below query for your requirement. It fetched distinct column values for a table.
It can be used only for the table having limited number of distinct values as I have used LISTAGG function.
SELECT POM.TABLE_NAME, POM.COLUMN_NAME,
XMLTYPE(DBMS_XMLGEN.GETXML('SELECT LISTAGG(COLUMN_NAME,'','') WITHIN GROUP (ORDER BY COLUMN_NAME) VAL
FROM (SELECT DISTINCT '|| POM.COLUMN_NAME ||' COLUMN_NAME
FROM '||POM.OWNER||'.'||POM.TABLE_NAME||')')
).EXTRACT('/ROWSET/ROW/VAL/text()').GETSTRINGVAL() VAL
FROM ALL_TAB_COLUMNS POM
WHERE POM.COLUMN_NAME LIKE'%STATUS%';

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to construct dynamic query over multiple tables - sql

Related

Listing tables used in a SQL query

PostgreSQL dynamic query: find most recent timestamp of several unrelated tables

Transpose SQL result that pulls column names and datatype

snowflake, get a list of mismatching columns between two tables (SQL)

How to select all possible values of columns from all tables?

Categories

Resources