How to Select * FROM all tables in a schema - sql

I have 100s of tables with the same structure in the same schema. I would like to run a query to see all rows where the 'sqft' column is NULL
SELECT * FROM table WHERE sqft = NULL
The tables I would like to iterate over all begin with the prefix 'tb_'
e.g 'tb_115_spooner_st'
After trying numerous solutions posted on here I cannot properly iterate over all these tables with a single script.
This is what I am currently working with
do $$
declare
rec record;
query text;
begin
for rec in select * from pg_tables where schemaname = 'public'
loop
query = format('SELECT * FROM %s WHERE sqft = NULL LIMIT 1', rec.tablename);
--raise notice '%', query;
execute query;
end loop;
end
$$ language plpgsql;
I am quite new to writing more complex SQL commands like this and have trouble understanding what is going wrong. I know there needs to be a section where the prefix is a condition, but the code running right now just returns a 'DO' in the console. Any help is appreciated.

Consider using the INFORMATION_SCHEMA.COLUMNS view to find the tables you need to query.
SELECT
CONCAT('SELECT * FROM ', table_name,' WHERE sqft IS NULL;')
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
COLUMN_NAME = 'sqft'
This will get you a list of SQL statements you can copy and paste into a new terminal and just run as a normal query.

Related

How can I loop to get counts from multiple tables?

I am trying to derive a table with counts from multiple tables. The tables are not on my schema. The table names on the schema that I am interested in all start with 'STAF_' and end with '_TS'. The criteria i am looking for is where SEP = 'MO'. So for example, the query in its base form is:
select area, count(SEP) areacount
from mous.STAF_0001_TS
where SEP = 'MO'
group by area;
I have about 1000 tables that i'd like to do this for.
Ultimatly, I'd like the output to be a table on my schema that looks like the following:
area| areacount
0001| 3
0002| 7
0003| 438
Thank you.
As a first step I'd write an SQL query that generates an SQL query:
SELECT 'SELECT area, count(*) FROM '||c.table_name||'UNION ALL' as run_me
FROM all_tables c
WHERE c.table_name LIKE 'STAF\_%\_MS' escape '\'
Running this will produce an output that is another SQL query. Copy the result text out of your results grid and paste it back into your query pane. Delete the final UNION ALL and run it
Once you dig how to write an SQL query that generate an SQL query, you can look at turning it into a view, or creating a dynamic query in a string.
Gotta say, this is a horrible way to store data; you'd be better off using ONE table with an extra column containing whatever is in xxx of STAF_xxx_MS right now
In Oracle 12c, you can embed a FUNCTION that will query the number of rows in any given table. Then you can use that function in your main query. Here is an example:
WITH FUNCTION cnt ( p_owner VARCHAR2, p_table_name VARCHAR2 ) RETURN NUMBER IS
l_cnt NUMBER;
BEGIN
EXECUTE IMMEDIATE 'SELECT count(*) INTO :cnt FROM ' || p_owner || '.' || p_table_name INTO l_cnt;
RETURN l_cnt;
EXCEPTION WHEN OTHERS THEN
RETURN NULL; -- This will happen for entries in ALL_TABLES that are not directly accessible (e.g., IOT overflow tables)
END cnt;
SELECT t.owner, t.table_name, cnt(t.owner, t.table_name)
FROM all_tables t
where t.table_Name like 'STAF\_%\_MS' escape '\';

PostgreSQL Function returning result set from dynamic tables names

In my database, I have the standard app tables and backup tables. Eg. for a table "employee", I have a table called "bak_employee". The bak_employee table is a backup of the employee table. I use it to restore the employee table between tests.
I'd figure I can use these "bak_" tables to see the changes that have occurred during the test like this:
SELECT * FROM employee EXCEPT SELECT * FROM bak_employee
This will show me the inserted and updated records. I'll ignore the deleted records for now.
Now, what I would like to do is go through all the tables in my database to see if there's any changes in any of the tables. I was thinking of doing this as a function so it's easy to call over and over. This is what I have so far:
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF diff_tables AS
$BODY$
DECLARE
app_tables text;
BEGIN
FOR app_tables IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name not like 'bak_%' -- exclude existing backup tables
LOOP
-- somehow loop through tables to see what's changed something like:
EXECUTE 'SELECT * FROM ' || app_tables || ' EXCEPT SELECT * FROM bak_' || app_tables;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
But obviously this isn't going to return me any useful information. Any help would be appreciated.
You cannot return various well-known row types from the same function in the same call. A cheap fix is to cast each row type to text, so we have a common return type.
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF text AS -- text!!
$func$
DECLARE
app_table text;
BEGIN
FOR app_table IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name NOT LIKE 'bak_%' -- exclude existing backup tables
LOOP
RETURN NEXT ' ';
RETURN NEXT '=== ' || app_table || ' ===';
RETURN QUERY EXECUTE format(
'SELECT x::text FROM (TABLE %I EXCEPT ALL TABLE %I) x'
, app_table, 'bak_' || app_table);
END LOOP;
RETURN;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM public.show_diff();
I had the test suggested by #a_horse at first, but after your comment I realized that there is no need for this. EXCEPT considers NULL values to be equal and shows all differences.
While being at it, I improved and simplified your solution some more. Use EXCEPT ALL: cheaper and does not run the risk of folding complete duplicates.
Using EXCEPT clause in PostgreSQL
TABLE is just syntactical sugar.
Is there a shortcut for SELECT * FROM in psql?
However, if you have an index on a unique (combination of) column(s), a JOIN like I suggested before should be faster: finding the only possible duplicate via index should be substantially cheaper.
Crucial element is the cast the row type to text (x::text).
You can even make the function work for any table - but never more than one at a time: With a polymorphic parameter type:
Refactor a PL/pgSQL function to return the output of various SELECT queries

Strange 'Unknown table' when performing plpgsql script

I have one same table in several schemas from PostgreSQL database server. I need
execute one query like below:
CREATE OR REPLACE FUNCTION git_search() RETURNS SETOF git_log AS $$
DECLARE
sch name;
BEGIN
FOREACH sch IN
select schema_name from information_schema.schemata where schema_name not in ('pg_toast','pg_temp_1','pg_toast_temp_1','pg_catalog','information_schema')
LOOP
qry := 'select count(*) from'|| quote_ident(sch) || '.git_log gl where gl.author_contributor_id = 17';
RETURN QUERY qry;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
select git_search();
but I have the error:
ERROR: "git_log" type not exists
SQL state: 42704
The git_log table is unknown in the first line of script. (clause CREATE)
Anybody can help me?
There are more than 100 schemas where I need perform the query that is adjusted for this situation. What is the best way to do this? Where I can create the function for this purpose?
The table name would serve just fine as composite type name, because a composite type of the same name (schema-qualified) is created with every table automatically.
The immediate cause of the error: none of your tables (actually the associated composite type of the same name) named git_log can be found in the current search_path, so the type name cannot be resolved.
Since you are operating with many schemas and many instances of tables called git_log, you need to be unambiguous and schema-qualify the table name. Just pick any one of your tables in one of the schemas, they all share the same layout:
But the rest of your function isn't going to work either. It's not a "plpgsql script", but a function definition. Try this:
CREATE OR REPLACE FUNCTION git_search()
RETURNS SETOF one_schema.git_log AS
$func$
DECLARE
sch text;
BEGIN
FOR sch IN
SELECT schema_name
FROM information_schema.schemata
WHERE schema_name NOT LIKE 'pg_%'
AND schema_name <> 'information_schema'
ORDER BY schema_name
LOOP
RETURN QUERY EXECUTE format(
'SELECT count(*)
FROM %I.git_log
WHERE author_contributor_id = 17', sch);
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM git_search();
Major points
FOREACH is for looping though arrays, you need a FOR loop.
You need dynamic SQL. Search for examples with more explanation.
Call the function with SELECT * FROM.
Related answer (one of many):
Passing column names dynamically for a record variable in PostgreSQL

Update multiple columns that start with a specific string

I am trying to update a bunch of columns in a DB for testing purposes of a feature. I have a table that is built with hibernate so all of the columns that are created for an embedded entity begin with the same name. I.e. contact_info_address_street1, contact_info_address_street2, etc.
I am trying to figure out if there is a way to do something to the affect of:
UPDATE table SET contact_info_address_* = null;
If not, I know I can do it the long way, just looking for a way to help myself out in the future if I need to do this all over again for a different set of columns.
You need dynamic SQL for this. So you must defend against possible SQL injection.
Basic query
The basic query to generate the DML command needed can look like this:
SELECT format('UPDATE tbl SET (%s) = (%s)'
,string_agg (quote_ident(attname), ', ')
,string_agg ('NULL', ', ')
)
FROM pg_attribute
WHERE attrelid = 'tbl'::regclass
AND NOT attisdropped
AND attnum > 0
AND attname ~~ 'foo_%';
Returns:
UPDATE tbl SET (foo_a, foo_b, foo_c) = (NULL, NULL, NULL);
Make use of the "column-list syntax" of UPDATE to shorten the code and simplify the task.
I query the system catalogs instead of information schema because the latter, while being standardized and guaranteed to be portable across major versions, is also notoriously slow and sometimes unwieldy. There are pros and cons, see:
Get column names and data types of a query, table or view
quote_ident() for the column names prevents SQL-injection - also necessary for identifiers.
string_agg() requires 9.0+.
Full automation with PL/pgSQL function
CREATE OR REPLACE FUNCTION f_update_cols(_tbl regclass, _col_pattern text
, OUT row_ct int, OUT col_ct bigint)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text;
BEGIN
SELECT INTO _sql, col_ct
format('UPDATE tbl SET (%s) = (%s)'
, string_agg (quote_ident(attname), ', ')
, string_agg ('NULL', ', ')
)
, count(*)
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped columns
AND attnum > 0 -- no system columns
AND attname LIKE _col_pattern; -- only columns matching pattern
-- RAISE NOTICE '%', _sql; -- output SQL for debugging
EXECUTE _sql;
GET DIAGNOSTICS row_ct = ROW_COUNT;
END
$func$;
COMMENT ON FUNCTION f_update_cols(regclass, text)
IS 'Updates all columns of table _tbl ($1)
that match _col_pattern ($2) in a LIKE expression.
Returns the count of columns (col_ct) and rows (row_ct) affected.';
Call:
SELECT * FROM f_update_cols('myschema.tbl', 'foo%');
To make the function more practical, it returns information as described in the comment. More about obtaining the result status in plpgsql in the manual.
I use the variable _sql to hold the query string, so I can collect the number of columns found (col_ct) in the same query.
The object identifier type regclass is the most efficient way to automatically avoid SQL injection (and sanitize non-standard names) for the table name, too. You can use schema-qualified table names to avoid ambiguities. I would advise to do so if you (can) have multiple schemas in your db! See:
Table name as a PostgreSQL function parameter
db<>fiddle here
Old sqlfiddle
There's no handy shortcut sorry. If you have to do this kind of thing a lot, you could create a function to dynamically execute sql and achieve your goal.
CREATE OR REPLACE FUNCTION reset_cols() RETURNS boolean AS $$ BEGIN
EXECUTE (select 'UPDATE table SET '
|| array_to_string(array(
select column_name::text
from information_schema.columns
where table_name = 'table'
and column_name::text like 'contact_info_address_%'
),' = NULL,')
|| ' = NULL');
RETURN true;
END; $$ LANGUAGE plpgsql;
-- run the function
SELECT reset_cols();
It's not very nice though. A better function would be one that accepts the tablename and column prefix as args. Which I'll leave as an exercise for the readers :)

How to choose tables on select from all_tables?

I have the following table name template, there are a couple with the same name and a number at the end: fmj.backup_semaforo_geo_THENUMBER, for example:
select * from fmj.backup_semaforo_geo_06391442
select * from fmj.backup_semaforo_geo_06398164
...
Lets say I need to select a column from every table which succeeds with the 'fmj.backup_semaforo_geo_%' filter, I tried this:
SELECT calle --This column is from the backup_semaforo_geo_# tables
FROM (SELECT table_name
FROM all_tables
WHERE owner = 'FMJ' AND table_name LIKE 'BACKUP_SEMAFORO_GEO_%');
But I'm getting the all_tables tables name data:
TABLE_NAME
----------
BACKUP_SEMAFORO_GEO_06391442
BACKUP_SEMAFORO_GEO_06398164
...
How can I achieve that without getting the all_tables output?
Thanks.
Presumably your current query is getting ORA-00904: "CALLE": invalid identifier, because the subquery doesn't have a column called CALLE. You can't provide a table name to a query at runtime like that, unfortunately, and have to resort to dynamic SQL.
Something like this will loop through all the tables and for each one will get all the values of CALLE from each one, which you can then loop through. I've used DBMS_OUTPUT to display them, assuming you're doing this in SQL*Plus or something that can deal with that; but you may want to do something else with them.
set serveroutput on
declare
-- declare a local collection type we can use for bulk collect; use any table
-- that has the column, or if there isn't a stable one use the actual data
-- type, varchar2(30) or whatever is appropriate
type t_values is table of table.calle%type;
-- declare an instance of that type
l_values t_values;
-- declare a cursor to generate the dynamic SQL; where this is done is a
-- matter of taste (can use 'open x for select ...', then fetch, etc.)
-- If you run the query on its own you'll see the individual selects from
-- all the tables
cursor c1 is
select table_name,
'select calle from ' || owner ||'.'|| table_name as query
from all_tables
where owner = 'FMJ'
and table_name like 'BACKUP_SEMAFORO_GEO%'
order by table_name;
begin
-- loop around all the dynamic queries from the cursor
for r1 in c1 loop
-- for each one, execute it as dynamic SQL, with a bulk collect into
-- the collection type created above
execute immediate r1.query bulk collect into l_values;
-- loop around all the elements in the collection, and print each one
for i in 1..l_values.count loop
dbms_output.put_line(r1.table_name ||': ' || l_values(i));
end loop;
end loop;
end;
/
May be a dynamic SQL in a PLSQL program;
for a in (SELECT table_name
FROM all_tables
WHERE owner = 'FMJ' AND table_name LIKE 'BACKUP_SEMAFORO_GEO_%')
LOOP
sql_stmt := ' SELECT calle FROM' || a.table_name;
EXECUTE IMMEDIATE sql_stmt;
...
...
END LOOP;