Postgres PL/pgSQL To Consolidate Columns Existing Across Various Tables - sql

I am implementing a tool to clean up all customer names across various tables in a schema called stage. The customer names could be coming from columns billing_acc_name or cust_acc_names. I do not know in advance how many tables have these columns, but as long as they do, they will be part of clean up.
However, prior to clean up, I need to select all unique customer names across the tables in the schema.
For better separation of concerns, I am looking at implementing this in PL/pgSQL. Currently, this is how I'm implementing this in Python/pandas/SQLAlchemy etc.
table_name = 'information_schema.columns'
table_schema_src = 'stage'
cols = ['billing_acc_name', 'cust_acc_name']
# get list of all table names and column names to query in stage schema
sql = text(f"""
SELECT table_name, column_name FROM {table_name} WHERE table_schema ='{table_schema_src}'
AND column_name = ANY(ARRAY{cols})
""")
src = pd.read_sql(sql, con=engine)
# explore implementation in pgsql
# establish query string
cnames = []
for i, row in src.iterrows():
s = text(f"""
SELECT DISTINCT upper({row['column_name']}) AS cname FROM stage.{row['table_name']}
""")
cnames.append(str(s).strip())
sql = ' UNION '.join(cnames)
df = pd.read_sql(sql, con=engine)
The auto-generated SQL query string are then as below:
SELECT DISTINCT upper(cust_acc_name) AS cname FROM stage.journal_2017_companyA UNION
SELECT DISTINCT upper(billing_acc_name) AS cname FROM stage.journal_2017_companyA UNION
SELECT DISTINCT upper(cust_acc_name) AS cname FROM stage.journal_2017_companyB UNION
SELECT DISTINCT upper(billing_acc_name) AS cname FROM stage.journal_2017_companyB UNION
SELECT DISTINCT upper(cust_acc_name) AS cname FROM stage.journal_2017_companyC UNION
SELECT DISTINCT upper(billing_acc_name) AS cname FROM stage.journal_2017_companyC UNION
SELECT DISTINCT upper(cust_acc_name) AS cname FROM stage.journal_2017_companyD UNION
SELECT DISTINCT upper(billing_acc_name) AS cname FROM stage.journal_2017_companyD

The plpgsql function may look like this:
create or replace function select_acc_names(_schema text)
returns setof text language plpgsql as $$
declare
rec record;
begin
for rec in
select table_name, column_name
from information_schema.columns
where table_schema = _schema
and column_name = any(array['cust_acc_name', 'billing_acc_name'])
loop
return query
execute format ($fmt$
select upper(%I) as cname
from %I.%I
$fmt$, rec.column_name, _schema, rec.table_name);
end loop;
end $$;
Use:
select *
from select_acc_names('stage');

Related

How to merge two tables with different column number in Snowflake?

I am querying TABLE_SCHEMA,TABLE_NAME,CREATED,LAST_ALTERED columns from Snowflake information schema. VIEWS. Next, I would like to MERGE that table with row count for the view. Below are my queries I am running in Snowflake my issue is I am not sure how to combine these two table in 1 table ?
Note: I am new to Snowflake. Please provide code with explanation.
Thanks in advance for help!
Query 1
SELECT TABLE_SCHEMA,TABLE_NAME,CREATED,LAST_ALTERED FROM DB.SCHEMA.VIEWS
WHERE TABLE_SCHEMA="MY_SHEMA" AND TABLE_NAME IN ('VIEW_TABLE1','VIEW_TABLE2','VIEW_TABLE3')
Query 2
SELECT COUNT(*) FROM DB.SCHEMA.VIEW_TABLE1
UNION ALL SELECT COUNT(*) FROM DB.SCHEMA.VIEW_TABLE2
To get result of the COUNT(*) needs to be built dynamically and attached to the "driving query".
Sample data:
CREATE VIEW VIEW_TABLE1(c)
AS
SELECT 1;
CREATE VIEW VIEW_TABLE2(e)
AS
SELECT 2 UNION ALL SELECT 4;
CREATE VIEW VIEW_TABLE3(f)
AS
SELECT 3;
Full query:
DECLARE
QUERY STRING;
RES RESULTSET;
BEGIN
SELECT
LISTAGG(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
$$SELECT '<TABLE_SCHEMA>' AS TABLE_SCHEMA,
'<TABLE_NAME>' AS TABLE_NAME,
'<CREATED>' AS CREATED,
'<LAST_ALTERED>' AS LAST_ALTERED,
COUNT(*) AS cnt
FROM <tab_name>
$$,
'<TABLE_SCHEMA>', v.TABLE_SCHEMA),
'<TABLE_NAME>', v.TABLE_NAME),
'<CREATED>', v.CREATED),
'<LAST_ALTERED>', v.LAST_ALTERED),
'<tab_name>', CONCAT_WS('.', v.table_catalog, v.table_schema, v.table_name)),
' UNION ALL ') WITHIN GROUP (ORDER BY CONCAT_WS('.', v.table_catalog, v.table_schema, v.table_name))
INTO :QUERY
FROM INFORMATION_SCHEMA.VIEWS v
WHERE TABLE_SCHEMA='PUBLIC'
AND TABLE_NAME IN ('VIEW_TABLE1','VIEW_TABLE2','VIEW_TABLE3');
RES := (EXECUTE IMMEDIATE :QUERY);
RETURN TABLE(RES);
END;
Output:
Rationale:
The ideal query would be(pseudocode):
SELECT TABLE_SCHEMA,TABLE_NAME,CREATED,LAST_ALTERED,
EVAL('SELECT COUNT(*) FROM ' || view_name) AS row_count
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_SCHEMA='MY_SHEMA'
AND TABLE_NAME IN ('VIEW_TABLE1','VIEW_TABLE2','VIEW_TABLE3');
Such construct EVAL(dynamic query) at SELECT list does not exist as it would require building a query on the fly and execute per each row. Though for some RDBMSes are workaround like dbms_xmlgen.getxmltype
Include table/view names as string in your count(*) queries and then you can join.
Example below -
select * from
(SELECT TABLE_SCHEMA,TABLE_NAME,CREATED FROM information_schema.tables
WHERE TABLE_SCHEMA='PUBLIC' AND TABLE_NAME IN ('D1','D2')) t1
left join
(
SELECT 'D1' table_name, COUNT(*) FROM d1
UNION ALL SELECT 'D2',COUNT(*) FROM d2) t2
on t1.table_name = t2.table_name ;
TABLE_SCHEMA
TABLE_NAME
CREATED
TABLE_NAME
COUNT(*)
PUBLIC
D1
2022-04-06 14:24:56.224 -0700
D1
12
PUBLIC
D2
2022-04-06 14:25:27.276 -0700
D2
5

Query to count all rows in all Snowflake views

I'm trying to get an count of all the rows in a set of views in my Snowflake database.
The built-in row_count from information_schema.tables is not present in information_schema.views, unfortunately.
It seems I'd need to count all rows in each view, something like:
with view_name as (select table_name
from account_usage.views
where table_schema = 'ACCESS' and RIGHT(table_name,7) = 'CURRENT'
)
select count (*) from view_name;
But that returns only one results, instead of one for each line
If I change the select to include the view name, i.e.
select concat('Rows in ', view_name), count (*) from view_name;
…it returns the error "invalid identifier 'VIEW_NAME' (line 5)"
How can I show all results and include the view name?
You can create a query that looks at the information_schema to create a query that will go view by view getting its count:
select listagg(xx, ' union all ')
from (
select 'select count(*) c, \'' || x || '\' v from ' || x as xx
from (
select TABLE_CATALOG ||'.'|| TABLE_SCHEMA ||'."'||TABLE_NAME||'"' x
from KNOEMA_FORECAST_DATA_ATLAS.INFORMATION_SCHEMA.VIEWS
where table_schema='FORECAST'
)
)
See also How to find the number of rows for all views in a schema?

How to extract the table name from a CREATE/UPDATE/INSERT statement in an SQL query?

I am trying to parse the table being created, inserted into or updated from the following sql queries stored in a table column.
Let's call the table column query. Following is some sample data to demonstrate variations in how the data could look like.
with sample_data as (
select 1 as id, 'CREATE TABLE tbl1 ...' as query union all
select 2 as id, 'CREATE OR REPLACE TABLE tbl1 ...' as query union all
select 3 as id, 'DROP TABLE IF EXISTS tbl1; CREATE TABLE tbl1 ...' as query union all
select 4 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
select 5 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
select 6 as id, 'UPDATE tbl3 SET col1 = ...' as query union all
select 7 as id, '/*some garbage comments*/ UPDATE tbl3 SET col1 = ...' as query union all
select 8 as id, 'DELETE tbl4 ...' as query
),
Following are the formats of the queries (we are trying to extract table_name ):
#1
some optional statements like drop table
CREATE some comments or optional statement like OR REPLACE TABLE table_name
everything else
#2
some optional statements like drop table
INSERT some comments INTO some comments table_name
#3
some optional statements like drop table
UPDATE some comments table_name
everything else
Regular Expression
To construct a suitable regex, let's start with the following relatively simple/readable version:
((CREATE( OR REPLACE)?|DROP) TABLE( IF EXISTS)?|UPDATE|DELETE|INSERT INTO) ([^\s\/*]+)
All the spaces above could be replaced with "at least one whitespace character", i.e. \s+. But we also need to allow comments. For a comment that looks like /*anything*/ the regex looks like \/\*.*\*\/ (where the comment characters are escaped with \ and "anything" is the .* in the middle). Given there could be multiple such comments, optionally separated by whitespace, we end up with (\s*\/\*.*\*\/\s*?)*\s+. Plugging this in everywhere there was a space gives:
((CREATE((\s*\/\*.*\*\/\s*?)*\s+OR(\s*\/\*.*\*\/\s*?)*\s+REPLACE)?|DROP)(\s*\/\*.*\*\/\s*?)*\s+TABLE((\s*\/\*.*\*\/\s*?)*\s+IF(\s*\/\*.*\*\/\s*?)*\s+EXISTS)?|UPDATE|DELETE|INSERT(\s*\/\*.*\*\/\s*?)*\s+INTO)(\s*\/\*.*\*\/\s*?)*\s+([^\s\/*]+)
One further refinement needs to be made: Bracketed expressions have been used for choices, e.g. (CHOICE1|CHOICE2). But this syntax includes them as capturing groups. Actually we only require one capturing group for the table name so we can exclude all the other capturing groups via ?:, e.g. (?:CHOICE1|CHOICE2). This gives:
(?:(?:CREATE(?:(?:\s*\/\*.*\*\/\s*?)*\s+OR(?:\s*\/\*.*\*\/\s*?)*\s+REPLACE)?|DROP)(?:\s*\/\*.*\*\/\s*?)*\s+TABLE(?:(?:\s*\/\*.*\*\/\s*?)*\s+IF(?:\s*\/\*.*\*\/\s*?)*\s+EXISTS)?|UPDATE|DELETE|INSERT(?:\s*\/\*.*\*\/\s*?)*\s+INTO)(?:\s*\/\*.*\*\/\s*?)*\s+([^\s\/*]+)
Online Regex Demo
Here's a demo of it working with your examples: Regex101 demo
SQL
The Google BigQuery documentation for REGEXP_EXTRACT says it will return the substring matched by the capturing group. So I'd expect something like this to work:
with sample_data as (
select 1 as id, 'CREATE TABLE tbl1 ...' as query union all
select 2 as id, 'CREATE OR REPLACE TABLE tbl1 ...' as query union all
select 3 as id, 'DROP TABLE IF EXISTS tbl1; CREATE TABLE tbl1 ...' as query union all
select 4 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
select 5 as id, 'INSERT /*some comment*/ INTO tbl2 ...' as query union all
select 6 as id, 'UPDATE tbl3 SET col1 = ...' as query union all
select 7 as id, '/*some garbage comments*/ UPDATE tbl3 SET col1 = ...' as query union all
select 8 as id, 'DELETE tbl4 ...' as query
)
SELECT
*, REGEXP_EXTRACT(query, r"(?:(?:CREATE(?:(?:\s*\/\*.*\*\/\s*?)*\s+OR(?:\s*\/\*.*\*\/\s*?)*\s+REPLACE)?|DROP)(?:\s*\/\*.*\*\/\s*?)*\s+TABLE(?:(?:\s*\/\*.*\*\/\s*?)*\s+IF(?:\s*\/\*.*\*\/\s*?)*\s+EXISTS)?|UPDATE|DELETE|INSERT(?:\s*\/\*.*\*\/\s*?)*\s+INTO)(?:\s*\/\*.*\*\/\s*?)*\s+([^\s\/*]+)") AS table_name
FROM sample_data;
(The above is untested so please let me know in the comments if there are any issues.)
I think it really depends on your data, but you might find some success using an approach like this:
with data as (
select 1 as id, 'CREATE TABLE tbl1 ...' as query union all
select 2 as id, 'INSERT INTO tbl2 ...' as query union all
select 3 as id, 'UPDATE tbl3 ...' as query union all
select 4 as id, 'DELETE tbl4 ...' as query
),
splitted as (
select id, split(query, ' ') as query_parts from data
)
select
id,
case
when query_parts[safe_offset(0)] in('CREATE', 'INSERT') then query_parts[safe_offset(2)]
when query_parts[safe_offset(0)] in('UPDATE', 'DELETE') then query_parts[safe_offset(1)]
else 'Error'
end as table_name
from splitted
Of course this depends on the cleanliness and syntax in your query column. Also, if your table_name is qualified with project.table.dataset you would need to do further splitting.

Oracle SQL : Retrieving non-existing values from IN clause

Having following query:
select table_name
from user_tables
where table_name in ('A','B','C','D','E','F');
Assuming only user_tables records B,C, and F exist, I want to retrieve the non-existing values A,D and E. This is a simple example, on real world the list can be huge.
A good way to generate fake rows is with a standard collection such as sys.odcivarchar2list:
select
tables_to_check.table_name,
case when user_tables.table_name is null then 'No' else 'Yes'end table_exists
from
(
select column_value table_name
from table(sys.odcivarchar2list('does not exist', 'TEST1'))
) tables_to_check
left join user_tables
on tables_to_check.table_name = user_tables.table_name
order by tables_to_check.table_name;
TABLE_NAME TABLE_EXISTS
---------- ------------
TEST1 Yes
does not exist No
if you have list of all those tables to be checked in Table1 then you can use NOT EXISTS clause
select name
from Table1 T1
where not exists ( select 1 from
user_tables U
where T1.name = U.table_name)
Only way is to use NOT EXISTS by converting the IN clause String into a Table of values.(CTE)
This is not a clean solution though. As The maximum length of IN clause expression is going to be 4000 only, including the commas..
WITH MY_STRING(str) AS
(
SELECT q'#'A','B','C','D','E','F'#' FROM DUAL
),
VALUES_TABLE AS
(
SELECT TRIM(BOTH '''' FROM REGEXP_SUBSTR(str,'[^,]+',1,level)) as table_name FROM MY_STRING
CONNECT BY LEVEL <= REGEXP_COUNT(str,',')
)
SELECT ME.* FROM VALUES_TABLE ME
WHERE NOT EXISTS
(SELECT 'X' FROM user_tables u
WHERE u.table_name = ME.table_name);
You can't. These values have to be entered into a temporary table at the least to do the desired operation. Also Oracle's IN clause list cannot be huge (i.e, not more than 1000 values).
Are you restricted to receiving those values as a comma delimited list?
instead of creating a comma delimited list with the source values, populate an array (or a table).
pass the array into a pl/sql procedure (or pull a cursor from the table).
loop through the array(cursor) and use a dynamic cusror to select count(table_name) from user_tables where table_name = value_pulled.
insert into table B when count(table_name) = 0.
then you can select all from table B
select * from tab1;
------------------
A
B
C
D
E
F
Create or replace procedure proc1 as
cursor c is select col1 from tab1;
r tab1.col1%type;
i number;
begin
open c;
loop
fetch c into r;
exit when c%notfound;
select count(tname) into i from tab where tname = r;
if i = 0 then
v_sql := 'insert into tab2 values ('''||r||''');
execute immediate v_sql;
commit;
end if;
end loop;
close c;
end proc1;
select * from tab2;
------------------
A
D
E
if this is not a one-off, then having this proc on hand will be handy.

Replacing concat operation in Where clause

We have a requirement of querying the ALL_TABLES view, based on a combination of schema name and table name.
There are two schemas "A" and "B" and they have same table "TAB1" in both of them, here my requirement is to select the table associated with schema A and not the schema B.
Currently, we are doing a concatenation operation on the table name and owner name for achieving it as shown below
There will be multiple owner and table name combinations available within a single query
select table_name from all_tables where concat(owner_name,table_name) in ('ATAB1','ATAB2','BTAB2','CTAB1')
select table_name from all_tables where concat(owner_name,table_name) not in ('ATAB1','ATAB2','BTAB2','CTAB1')
Here there are three schemas A, B and C with their respective table name combinations
How can we achieve the same result without using the CONCAT function ?
WHERE 0=1
OR (owner_name = 'A' AND table_name = 'T1')
OR (owner_name = 'B' AND table_name = 'T2')
OR (owner_name = 'A' AND table_name = 'T3')
The strange 0=1 is just to make the lines below syntactically identical for easy mainenance and/or code-generation. The optimizer removes it.
Oracle allows for multiple columns in an IN condition (see the documentation for some more examples).
select table_name
from all_tables
where (owner_name, table_name) in
(('A','TAB1'), ('A','TAB2'), ('B','TAB2'), ('C','TAB1'))
This would probably be equivalent to usr's answer in terms of performance.
You could arrange the string values you need to match against into a virtual table, then use that table in a join as a filter:
SELECT t.*
FROM all_tables t
INNER JOIN (
SELECT 'A' AS owner_name, 'TAB1' AS table_name FROM DUAL
UNION ALL SELECT 'A', 'TAB2' FROM DUAL
UNION ALL SELECT 'B', 'TAB2' FROM DUAL
UNION ALL SELECT 'C', 'TAB1' FROM DUAL
) s
ON t.owner_name = s.owner_name
AND t.table_name = s.table_name
;
I would expect this to give the query planner more room for optimisation than your present approach gives.