Is it possible to select only tables where a column (AD_CLIENT_ID) contains a certain numeric value (1000000)?
I have 850 tables to select from and each table has this column, but not all columns contain 1000000.
hmm..
strongly reconsider your design.
Make a view of a huge UNION structure
of all your tables, then query that WHERE AD_CLIENT_ID = 100000
You could get the list of tables with a plpgsql function executing dynamic SQL, looping through the results from the catalog. Consider the following demo, tested on PostgreSQL 9.1, but should work at least with version 8.4 or even older:
CREATE OR REPLACE FUNCTION f_tbl_with_value(numeric)
RETURNS SETOF text AS
$BODY$
DECLARE
_tbl text;
BEGIN
FOR _tbl IN
SELECT DISTINCT quote_ident(n.nspname) || '.' || quote_ident(c.relname)
FROM pg_catalog.pg_attribute a
JOIN pg_catalog.pg_class c ON c.oid = a.attrelid
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE a.attname = 'ad_client_id'
AND a.attisdropped = FALSE -- column hasn't been dropped
AND n.nspname = 'myschema' -- search only this schema
AND c.relkind = 'r' -- only real tables
LOOP
RETURN QUERY EXECUTE '
SELECT ''' || _tbl || '''::text
WHERE EXISTS (
SELECT *
FROM ' || _tbl || '
WHERE ad_client_id = $1
)'
USING $1;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql;
Call:
SELECT * FROM x.f_tbl_with_value(100000)
Major points
Result is a list of table names where the column ad_client_id exists and contains the parameter value in at least one row.
I use the PostgreSQL catalog. You can do the same with the SQL standard information schema like you demonstrate in your comment. But that's a lot slower and is only useful if you want to keep your code portable. As this plpgsql function is not portable to other database systems anyway, I use the faster PostgreSQL catalog tables.
Note how I use plain SQL to retrieve the table names, but dynamic SQL to query them.
Note how I use quote_ident() on schema and table name to safeguard against SQL injection and automatically preserve mixed case identifiers if need should be.
I used the lower case string 'ad_client_id' for the column name. Or do you actually need upper case because you double-quoted "AD_CLIENT_ID" when creating it? (I generally advise to use lower case identifiers only. More about that in the manual.)
Related
I would be interested to drop all tables in a Redshift schema. Even though this solution works
DROP SCHEMA public CASCADE;
CREATE SCHEMA public;
is NOT good for me since that it drops SCHEMA permissions as well.
A solution like
DO $$ DECLARE
r RECORD;
BEGIN
-- if the schema you operate on is not "current", you will want to
-- replace current_schema() in query with 'schematodeletetablesfrom'
-- *and* update the generate 'DROP...' accordingly.
FOR r IN (SELECT tablename FROM pg_tables WHERE schemaname = current_schema()) LOOP
EXECUTE 'DROP TABLE IF EXISTS ' || quote_ident(r.tablename) || ' CASCADE';
END LOOP;
END $$;
as reported in this thread How can I drop all the tables in a PostgreSQL database?
would be ideal. Unfortunately it doesn't work on Redshift (apparently there is no support for for loops).
Is there any other solution to achieve it?
Run this SQL and copy+paste the result on your SQL client.
If you want to do it programmatically you need to built little bit code around it.
SELECT 'DROP TABLE IF EXISTS ' || tablename || ' CASCADE;'
FROM pg_tables
WHERE schemaname = '<your_schema>'
I solved it through a procedure that deletes all records. Using this technique to truncate fails but deleting it works fine for my intents and purposes.
create or replace procedure sp_truncate_dwh() as $$
DECLARE
tables RECORD;
BEGIN
FOR tables in SELECT tablename
FROM pg_tables
WHERE schemaname = 'dwh'
order by tablename
LOOP
EXECUTE 'delete from dwh.' || quote_ident(tables.tablename) ;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
--call sp_truncate_dwh()
In addition to demircioglu's answer, I had to add Commit after every drop statement to drop all tables in my schema. SELECT 'DROP TABLE IF EXISTS ' || tablename || ' CASCADE; COMMIT;' FROM pg_tables WHERE schemaname = '<your_schema>'
P.S.: I do not have required reputation to add this note as a comment and had to add as an answer.
Using Python and pyscopg2 locally on my PC I came up with this script to delete all tables in schema:
import psycopg2
schema = "schema_to_be_deleted"
try:
conn = psycopg2.connect("dbname='{}' port='{}' host='{}' user='{}' password='{}'".format("DB_NAME", "DB_PORT", "DB_HOST", "DB_USER", "DB_PWD"))
cursor = conn.cursor()
cursor.execute("SELECT tablename FROM pg_tables WHERE schemaname = '%s'" % schema)
rows = cursor.fetchall()
for row in rows:
cursor.execute("DROP TABLE {}.{}".format(schema, row[0]))
cursor.close()
conn.commit()
except psycopg2.DatabaseError as error:
logger.error(error)
finally:
if conn is not None:
conn.close()
Replace correctly values for DB_NAME, DB_PORT, DB_HOST, DB_USER and DB_PWD to connect to the Redshift DB
The following recipe differs from other answers in the regard that it generates one SQL statement for all tables we're going to delete.
SELECT
'DROP TABLE ' ||
LISTAGG("table", ', ') ||
';'
FROM
svv_table_info
WHERE
"table" LIKE 'staging_%';
Example result:
DROP TABLE staging_077815128468462e9de8ca6fec22f284, staging_abc, staging_123;
As in other answers, you will need to copy the generated SQL and execute it separately.
References
|| operator concatenates strings
LISTAGG function concatenates every table name into a string with a separator
The table svv_table_info is used because LISTAGG doesn't want to work with pg_tables for me. Complaint:
One or more of the used functions must be applied on at least one user created tables. Examples of user table only functions are LISTAGG, MEDIAN, PERCENTILE_CONT, etc
UPD. I just now noticed that SVV_TABLE_INFO page says:
The SVV_TABLE_INFO view doesn't return any information for empty tables.
...which means empty tables will not be in the list returned by this query. I usually delete transient tables to save disk space, so this does not bother me much; but in general this factor should be considered.
In my database, I have the standard app tables and backup tables. Eg. for a table "employee", I have a table called "bak_employee". The bak_employee table is a backup of the employee table. I use it to restore the employee table between tests.
I'd figure I can use these "bak_" tables to see the changes that have occurred during the test like this:
SELECT * FROM employee EXCEPT SELECT * FROM bak_employee
This will show me the inserted and updated records. I'll ignore the deleted records for now.
Now, what I would like to do is go through all the tables in my database to see if there's any changes in any of the tables. I was thinking of doing this as a function so it's easy to call over and over. This is what I have so far:
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF diff_tables AS
$BODY$
DECLARE
app_tables text;
BEGIN
FOR app_tables IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name not like 'bak_%' -- exclude existing backup tables
LOOP
-- somehow loop through tables to see what's changed something like:
EXECUTE 'SELECT * FROM ' || app_tables || ' EXCEPT SELECT * FROM bak_' || app_tables;
END LOOP;
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
But obviously this isn't going to return me any useful information. Any help would be appreciated.
You cannot return various well-known row types from the same function in the same call. A cheap fix is to cast each row type to text, so we have a common return type.
CREATE OR REPLACE FUNCTION public.show_diff()
RETURNS SETOF text AS -- text!!
$func$
DECLARE
app_table text;
BEGIN
FOR app_table IN
SELECT table_name
FROM information_schema.tables
WHERE table_catalog = 'myDatabase'
AND table_schema = 'public'
AND table_name NOT LIKE 'bak_%' -- exclude existing backup tables
LOOP
RETURN NEXT ' ';
RETURN NEXT '=== ' || app_table || ' ===';
RETURN QUERY EXECUTE format(
'SELECT x::text FROM (TABLE %I EXCEPT ALL TABLE %I) x'
, app_table, 'bak_' || app_table);
END LOOP;
RETURN;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM public.show_diff();
I had the test suggested by #a_horse at first, but after your comment I realized that there is no need for this. EXCEPT considers NULL values to be equal and shows all differences.
While being at it, I improved and simplified your solution some more. Use EXCEPT ALL: cheaper and does not run the risk of folding complete duplicates.
Using EXCEPT clause in PostgreSQL
TABLE is just syntactical sugar.
Is there a shortcut for SELECT * FROM in psql?
However, if you have an index on a unique (combination of) column(s), a JOIN like I suggested before should be faster: finding the only possible duplicate via index should be substantially cheaper.
Crucial element is the cast the row type to text (x::text).
You can even make the function work for any table - but never more than one at a time: With a polymorphic parameter type:
Refactor a PL/pgSQL function to return the output of various SELECT queries
I have one same table in several schemas from PostgreSQL database server. I need
execute one query like below:
CREATE OR REPLACE FUNCTION git_search() RETURNS SETOF git_log AS $$
DECLARE
sch name;
BEGIN
FOREACH sch IN
select schema_name from information_schema.schemata where schema_name not in ('pg_toast','pg_temp_1','pg_toast_temp_1','pg_catalog','information_schema')
LOOP
qry := 'select count(*) from'|| quote_ident(sch) || '.git_log gl where gl.author_contributor_id = 17';
RETURN QUERY qry;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
select git_search();
but I have the error:
ERROR: "git_log" type not exists
SQL state: 42704
The git_log table is unknown in the first line of script. (clause CREATE)
Anybody can help me?
There are more than 100 schemas where I need perform the query that is adjusted for this situation. What is the best way to do this? Where I can create the function for this purpose?
The table name would serve just fine as composite type name, because a composite type of the same name (schema-qualified) is created with every table automatically.
The immediate cause of the error: none of your tables (actually the associated composite type of the same name) named git_log can be found in the current search_path, so the type name cannot be resolved.
Since you are operating with many schemas and many instances of tables called git_log, you need to be unambiguous and schema-qualify the table name. Just pick any one of your tables in one of the schemas, they all share the same layout:
But the rest of your function isn't going to work either. It's not a "plpgsql script", but a function definition. Try this:
CREATE OR REPLACE FUNCTION git_search()
RETURNS SETOF one_schema.git_log AS
$func$
DECLARE
sch text;
BEGIN
FOR sch IN
SELECT schema_name
FROM information_schema.schemata
WHERE schema_name NOT LIKE 'pg_%'
AND schema_name <> 'information_schema'
ORDER BY schema_name
LOOP
RETURN QUERY EXECUTE format(
'SELECT count(*)
FROM %I.git_log
WHERE author_contributor_id = 17', sch);
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM git_search();
Major points
FOREACH is for looping though arrays, you need a FOR loop.
You need dynamic SQL. Search for examples with more explanation.
Call the function with SELECT * FROM.
Related answer (one of many):
Passing column names dynamically for a record variable in PostgreSQL
I am trying to update a bunch of columns in a DB for testing purposes of a feature. I have a table that is built with hibernate so all of the columns that are created for an embedded entity begin with the same name. I.e. contact_info_address_street1, contact_info_address_street2, etc.
I am trying to figure out if there is a way to do something to the affect of:
UPDATE table SET contact_info_address_* = null;
If not, I know I can do it the long way, just looking for a way to help myself out in the future if I need to do this all over again for a different set of columns.
You need dynamic SQL for this. So you must defend against possible SQL injection.
Basic query
The basic query to generate the DML command needed can look like this:
SELECT format('UPDATE tbl SET (%s) = (%s)'
,string_agg (quote_ident(attname), ', ')
,string_agg ('NULL', ', ')
)
FROM pg_attribute
WHERE attrelid = 'tbl'::regclass
AND NOT attisdropped
AND attnum > 0
AND attname ~~ 'foo_%';
Returns:
UPDATE tbl SET (foo_a, foo_b, foo_c) = (NULL, NULL, NULL);
Make use of the "column-list syntax" of UPDATE to shorten the code and simplify the task.
I query the system catalogs instead of information schema because the latter, while being standardized and guaranteed to be portable across major versions, is also notoriously slow and sometimes unwieldy. There are pros and cons, see:
Get column names and data types of a query, table or view
quote_ident() for the column names prevents SQL-injection - also necessary for identifiers.
string_agg() requires 9.0+.
Full automation with PL/pgSQL function
CREATE OR REPLACE FUNCTION f_update_cols(_tbl regclass, _col_pattern text
, OUT row_ct int, OUT col_ct bigint)
LANGUAGE plpgsql AS
$func$
DECLARE
_sql text;
BEGIN
SELECT INTO _sql, col_ct
format('UPDATE tbl SET (%s) = (%s)'
, string_agg (quote_ident(attname), ', ')
, string_agg ('NULL', ', ')
)
, count(*)
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped columns
AND attnum > 0 -- no system columns
AND attname LIKE _col_pattern; -- only columns matching pattern
-- RAISE NOTICE '%', _sql; -- output SQL for debugging
EXECUTE _sql;
GET DIAGNOSTICS row_ct = ROW_COUNT;
END
$func$;
COMMENT ON FUNCTION f_update_cols(regclass, text)
IS 'Updates all columns of table _tbl ($1)
that match _col_pattern ($2) in a LIKE expression.
Returns the count of columns (col_ct) and rows (row_ct) affected.';
Call:
SELECT * FROM f_update_cols('myschema.tbl', 'foo%');
To make the function more practical, it returns information as described in the comment. More about obtaining the result status in plpgsql in the manual.
I use the variable _sql to hold the query string, so I can collect the number of columns found (col_ct) in the same query.
The object identifier type regclass is the most efficient way to automatically avoid SQL injection (and sanitize non-standard names) for the table name, too. You can use schema-qualified table names to avoid ambiguities. I would advise to do so if you (can) have multiple schemas in your db! See:
Table name as a PostgreSQL function parameter
db<>fiddle here
Old sqlfiddle
There's no handy shortcut sorry. If you have to do this kind of thing a lot, you could create a function to dynamically execute sql and achieve your goal.
CREATE OR REPLACE FUNCTION reset_cols() RETURNS boolean AS $$ BEGIN
EXECUTE (select 'UPDATE table SET '
|| array_to_string(array(
select column_name::text
from information_schema.columns
where table_name = 'table'
and column_name::text like 'contact_info_address_%'
),' = NULL,')
|| ' = NULL');
RETURN true;
END; $$ LANGUAGE plpgsql;
-- run the function
SELECT reset_cols();
It's not very nice though. A better function would be one that accepts the tablename and column prefix as args. Which I'll leave as an exercise for the readers :)
Given the following schema:
create table account_type_a (
id SERIAL UNIQUE PRIMARY KEY,
some_column VARCHAR
);
create table account_type_b (
id SERIAL UNIQUE PRIMARY KEY,
some_other_column VARCHAR
);
create view account_type_a view AS select * from account_type_a;
create view account_type_b view AS select * from account_type_b;
I try to create a generic trigger function in plpgsql, which enables updating the view:
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
create trigger trUpdate instead of UPDATE on account_view_type_a
for each row execute procedure updateAccount();
An unsuccessful effort of mine was:
create function updateAccount() returns trigger as $$
declare
target_table varchar := substring(TG_TABLE_NAME from '(.+)_view');
cols varchar;
begin
execute 'select string_agg(column_name,$1) from information_schema.columns
where table_name = $2' using ',', target_table into cols;
execute 'update ' || target_table || ' set (' || cols || ') = select ($1).*
where id = ($1).id' using NEW;
return NULL;
end;
$$ language plpgsql;
The problem is the update statement. I am unable to come up with a syntax that would work here. I have successfully implemented this in PL/Perl, but would be interested in a plpgsql-only solution.
Any ideas?
Update
As #Erwin Brandstetter suggested, here is the code for my PL/Perl solution. I incoporated some of his suggestions.
create function f_tr_up() returns trigger as $$
use strict;
use warnings;
my $target_table = quote_ident($_TD->{'table_name'}) =~ s/^([\w]+)_view$/$1/r;
my $NEW = $_TD->{'new'};
my $cols = join(',', map { quote_ident($_) } keys $NEW);
my $vals = join(',', map { quote_literal($_) } values $NEW);
my $query = sprintf(
"update %s set (%s) = (%s) where id = %d",
$target_table,
$cols,
$vals,
$NEW->{'id'});
spi_exec_query($query);
return;
$$ language plperl;
While #Gary's answer is technically correct, it fails to mention that PostgreSQL does support this form:
UPDATE tbl
SET (col1, col2, ...) = (expression1, expression2, ..)
Read the manual on UPDATE.
It's still tricky to get this done with dynamic SQL. I'll assume a simple case where views consist of the same columns as their underlying tables.
CREATE VIEW tbl_view AS SELECT * FROM tbl;
Problems
The special record NEW is not visible inside EXECUTE. I pass NEW as a single parameter with the USING clause of EXECUTE.
As discussed, UPDATE with list-form needs individual values. I use a subselect to split the record into individual columns:
UPDATE ...
FROM (SELECT ($1).*) x
(Parenthesis around $1 are not optional.) This allows me to simply use two column lists built with string_agg() from the catalog table: one with and one without table qualification.
It's not possible to assign a row value as a whole to individual columns. The manual:
According to the standard, the source value for a parenthesized
sub-list of target column names can be any row-valued expression
yielding the correct number of columns. PostgreSQL only allows the
source value to be a row constructor or a sub-SELECT.
INSERT is implemented simpler. If the structure of view and table are identical we can omit the column definition list. (Can be improved, see below.)
Solution
I made a couple of updates to your approach to make it shine.
Trigger function for UPDATE:
CREATE OR REPLACE FUNCTION f_trg_up()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME from '(.+)_view$'));
_cols text;
_vals text;
BEGIN
SELECT INTO _cols, _vals
string_agg(quote_ident(attname), ', ')
, string_agg('x.' || quote_ident(attname), ', ')
FROM pg_attribute
WHERE attrelid = _tbl
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0; -- no system columns
EXECUTE format('
UPDATE %s
SET (%s) = (%s)
FROM (SELECT ($1).*) x', _tbl, _cols, _vals)
USING NEW;
RETURN NEW; -- Don't return NULL unless you knwo what you're doing
END
$func$;
Trigger function for INSERT:
CREATE OR REPLACE FUNCTION f_trg_ins()
RETURNS TRIGGER
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := quote_ident(TG_TABLE_SCHEMA) || '.'
|| quote_ident(substring(TG_TABLE_NAME FROM '(.+)_view$'));
BEGIN
EXECUTE format('INSERT INTO %s SELECT ($1).*', _tbl)
USING NEW;
RETURN NEW; -- Don't return NULL unless you know what you're doing
END
$func$;
Triggers:
CREATE TRIGGER trg_instead_up
INSTEAD OF UPDATE ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_up();
CREATE TRIGGER trg_instead_ins
INSTEAD OF INSERT ON a_view
FOR EACH ROW EXECUTE FUNCTION f_trg_ins();
Before Postgres 11 the syntax (oddly) was EXECUTE PROCEDURE instead of EXECUTE FUNCTION - which also still works.
db<>fiddle here - demonstrating INSERT and UPDATE
Old sqlfiddle
Major points
Include the schema name to make the table reference unambiguous. There can be multiple table of the same name in one database with multiple schemas!
Query pg_catalog.pg_attribute instead of information_schema.columns. Less portable, but much faster and allows to use the table-OID.
How to check if a table exists in a given schema
Table names are NOT safe against SQLi when concatenated as strings for dynamic SQL. Escape with quote_ident() or format() or with an object-identifer type. This includes the special trigger function variables TG_TABLE_SCHEMA and TG_TABLE_NAME!
Cast to the object identifier type regclass to assert the table name is valid and get the OID for the catalog look-up.
Optionally use format() to build the dynamic query string safely.
No need for dynamic SQL for the first query on the catalog tables. Faster, simpler.
Use RETURN NEW instead of RETURN NULL in these trigger functions unless you know what you are doing. (NULL would cancel the INSERT for the current row.)
This simple version assumes that every table (and view) has a unique column named id. A more sophisticated version might use the primary key dynamically.
The function for UPDATE allows the columns of view and table to be in any order, as long as the set is the same.
The function for INSERT expects the columns of view and table to be in identical order. If you want to allow arbitrary order, add a column definition list to the INSERT command, just like with UPDATE.
Updated version also covers changes to the id column by using OLD additionally.
Postgresql doesn't support updating multiple columns using the set (col1,col2) = select val1,val2 syntax.
To achieve the same in postgresql you'd use
update target_table
set col1 = d.val1,
col2 = d.val2
from source_table d
where d.id = target_table.id
This is going to make the dynamic query a bit more complex to build as you'll need to iterate the column name list you're using into individual fields. I'd suggest you use array_agg instead of string_agg as an array is easier to process than splitting the string again.
Postgresql UPDATE syntax
documentation on array_agg function