Postgres and Indexes on Foreign Keys and Primary Keys

Postgres and Indexes on Foreign Keys and Primary Keys - sql

Does Postgres automatically put indexes on Foreign Keys and Primary Keys? How can I tell? Is there a command that will return all indexes on a table?

PostgreSQL automatically creates indexes on primary keys and unique constraints, but not on the referencing side of foreign key relationships.
When Pg creates an implicit index it will emit a NOTICE-level message that you can see in psql and/or the system logs, so you can see when it happens. Automatically created indexes are visible in \d output for a table, too.
The documentation on unique indexes says:
PostgreSQL automatically creates an index for each unique constraint and primary key constraint to enforce uniqueness. Thus, it is not necessary to create an index explicitly for primary key columns.
and the documentation on constraints says:
Since a DELETE of a row from the referenced table or an UPDATE of a
referenced column will require a scan of the referencing table for
rows matching the old value, it is often a good idea to index the
referencing columns. Because this is not always needed, and there are
many choices available on how to index, declaration of a foreign key
constraint does not automatically create an index on the referencing
columns.
Therefore you have to create indexes on foreign-keys yourself if you want them.
Note that if you use primary-foreign-keys, like 2 FK's as a PK in a M-to-N table, you will have an index on the PK and probably don't need to create any extra indexes.
While it's usually a good idea to create an index on (or including) your referencing-side foreign key columns, it isn't required. Each index you add slows DML operations down slightly, so you pay a performance cost on every INSERT, UPDATE or DELETE. If the index is rarely used it may not be worth having.

This query will list missing indexes on foreign keys, original source.
Edit: Note that it will not check small tables (less then 9 MB) and some other cases. See final WHERE statement.
-- check for FKs where there is no matching index
-- on the referencing side
-- or a bad index
WITH fk_actions ( code, action ) AS (
VALUES ( 'a', 'error' ),
( 'r', 'restrict' ),
( 'c', 'cascade' ),
( 'n', 'set null' ),
( 'd', 'set default' )
),
fk_list AS (
SELECT pg_constraint.oid as fkoid, conrelid, confrelid as parentid,
conname, relname, nspname,
fk_actions_update.action as update_action,
fk_actions_delete.action as delete_action,
conkey as key_cols
FROM pg_constraint
JOIN pg_class ON conrelid = pg_class.oid
JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid
JOIN fk_actions AS fk_actions_update ON confupdtype = fk_actions_update.code
JOIN fk_actions AS fk_actions_delete ON confdeltype = fk_actions_delete.code
WHERE contype = 'f'
),
fk_attributes AS (
SELECT fkoid, conrelid, attname, attnum
FROM fk_list
JOIN pg_attribute
ON conrelid = attrelid
AND attnum = ANY( key_cols )
ORDER BY fkoid, attnum
),
fk_cols_list AS (
SELECT fkoid, array_agg(attname) as cols_list
FROM fk_attributes
GROUP BY fkoid
),
index_list AS (
SELECT indexrelid as indexid,
pg_class.relname as indexname,
indrelid,
indkey,
indpred is not null as has_predicate,
pg_get_indexdef(indexrelid) as indexdef
FROM pg_index
JOIN pg_class ON indexrelid = pg_class.oid
WHERE indisvalid
),
fk_index_match AS (
SELECT fk_list.*,
indexid,
indexname,
indkey::int[] as indexatts,
has_predicate,
indexdef,
array_length(key_cols, 1) as fk_colcount,
array_length(indkey,1) as index_colcount,
round(pg_relation_size(conrelid)/(1024^2)::numeric) as table_mb,
cols_list
FROM fk_list
JOIN fk_cols_list USING (fkoid)
LEFT OUTER JOIN index_list
ON conrelid = indrelid
AND (indkey::int2[])[0:(array_length(key_cols,1) -1)] #> key_cols
),
fk_perfect_match AS (
SELECT fkoid
FROM fk_index_match
WHERE (index_colcount - 1) <= fk_colcount
AND NOT has_predicate
AND indexdef LIKE '%USING btree%'
),
fk_index_check AS (
SELECT 'no index' as issue, *, 1 as issue_sort
FROM fk_index_match
WHERE indexid IS NULL
UNION ALL
SELECT 'questionable index' as issue, *, 2
FROM fk_index_match
WHERE indexid IS NOT NULL
AND fkoid NOT IN (
SELECT fkoid
FROM fk_perfect_match)
),
parent_table_stats AS (
SELECT fkoid, tabstats.relname as parent_name,
(n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as parent_writes,
round(pg_relation_size(parentid)/(1024^2)::numeric) as parent_mb
FROM pg_stat_user_tables AS tabstats
JOIN fk_list
ON relid = parentid
),
fk_table_stats AS (
SELECT fkoid,
(n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as writes,
seq_scan as table_scans
FROM pg_stat_user_tables AS tabstats
JOIN fk_list
ON relid = conrelid
)
SELECT nspname as schema_name,
relname as table_name,
conname as fk_name,
issue,
table_mb,
writes,
table_scans,
parent_name,
parent_mb,
parent_writes,
cols_list,
indexdef
FROM fk_index_check
JOIN parent_table_stats USING (fkoid)
JOIN fk_table_stats USING (fkoid)
WHERE table_mb > 9
AND ( writes > 1000
OR parent_writes > 1000
OR parent_mb > 10 )
ORDER BY issue_sort, table_mb DESC, table_name, fk_name;

If you want to list the indexes of all the tables in your schema(s) from your program, all the information is on hand in the catalog:
select
n.nspname as "Schema"
,t.relname as "Table"
,c.relname as "Index"
from
pg_catalog.pg_class c
join pg_catalog.pg_namespace n on n.oid = c.relnamespace
join pg_catalog.pg_index i on i.indexrelid = c.oid
join pg_catalog.pg_class t on i.indrelid = t.oid
where
c.relkind = 'i'
and n.nspname not in ('pg_catalog', 'pg_toast')
and pg_catalog.pg_table_is_visible(c.oid)
order by
n.nspname
,t.relname
,c.relname
If you want to delve further (such as columns and ordering), you need to look at pg_catalog.pg_index. Using psql -E [dbname] comes in handy for figuring out how to query the catalog.

Yes - for primary keys, no - for foreign keys (more in the docs).
\d <table_name>
in "psql" shows a description of a table including all its indexes.

I love how this is explained in the article Cool performance features of EclipseLink 2.5
Indexing Foreign Keys
The first feature is auto indexing of foreign keys. Most people incorrectly assume that databases index
foreign keys by default. Well, they don't. Primary keys are auto
indexed, but foreign keys are not. This means any query based on the
foreign key will be doing full table scans. This is any OneToMany,
ManyToMany or ElementCollection relationship, as well as many OneToOne
relationships, and most queries on any relationship involving joins or
object comparisons. This can be a major perform issue, and you should
always index your foreign keys fields.

This function, based on the work by Laurenz Albe at https://www.cybertec-postgresql.com/en/index-your-foreign-key/, list all the foreign keys with missing indexes. The size of the table is shown, as for small tables the scanning performance could be superior to the index one.
--
-- function: missing_fk_indexes
-- purpose: List all foreing keys in the database without and index in the referencing table.
-- author: Based on the work of Laurenz Albe
-- see: https://www.cybertec-postgresql.com/en/index-your-foreign-key/
--
create or replace function missing_fk_indexes ()
returns table (
referencing_table regclass,
fk_columns varchar,
table_size varchar,
fk_constraint name,
referenced_table regclass
)
language sql as $$
select
-- referencing table having ta foreign key declaration
tc.conrelid::regclass as referencing_table,
-- ordered list of foreign key columns
string_agg(ta.attname, ', ' order by tx.n) as fk_columns,
-- referencing table size
pg_catalog.pg_size_pretty (
pg_catalog.pg_relation_size(tc.conrelid)
) as table_size,
-- name of the foreign key constraint
tc.conname as fk_constraint,
-- name of the target or destination table
tc.confrelid::regclass as referenced_table
from pg_catalog.pg_constraint tc
-- enumerated key column numbers per foreign key
cross join lateral unnest(tc.conkey) with ordinality as tx(attnum, n)
-- name for each key column
join pg_catalog.pg_attribute ta on ta.attnum = tx.attnum and ta.attrelid = tc.conrelid
where not exists (
-- is there ta matching index for the constraint?
select 1 from pg_catalog.pg_index i
where
i.indrelid = tc.conrelid and
-- the first index columns must be the same as the key columns, but order doesn't matter
(i.indkey::smallint[])[0:cardinality(tc.conkey)-1] #> tc.conkey) and
tc.contype = 'f'
group by
tc.conrelid,
tc.conname,
tc.confrelid
order by
pg_catalog.pg_relation_size(tc.conrelid) desc
$$;
test it this way,
select * from missing_fk_indexes();
you'll see a list like this.
referencing_table | fk_columns | table_size | fk_constraint | referenced_table
------------------------+------------------+------------+----------------------------------------------+------------------
stk_warehouse | supplier_id | 8192 bytes | stk_warehouse_supplier_id_fkey | stk_supplier
stk_reference | supplier_id | 0 bytes | stk_reference_supplier_id_fkey | stk_supplier
stk_part_reference | reference_id | 0 bytes | stk_part_reference_reference_id_fkey | stk_reference
stk_warehouse_part | part_id | 0 bytes | stk_warehouse_part_part_id_fkey | stk_part
stk_warehouse_part_log | dst_warehouse_id | 0 bytes | stk_warehouse_part_log_dst_warehouse_id_fkey | stk_warehouse
stk_warehouse_part_log | part_id | 0 bytes | stk_warehouse_part_log_part_id_fkey | stk_part
stk_warehouse_part_log | src_warehouse_id | 0 bytes | stk_warehouse_part_log_src_warehouse_id_fkey | stk_warehouse
stk_product_part | part_id | 0 bytes | stk_product_part_part_id_fkey | stk_part
stk_purchase | parent_id | 0 bytes | stk_purchase_parent_id_fkey | stk_purchase
stk_purchase | supplier_id | 0 bytes | stk_purchase_supplier_id_fkey | stk_supplier
stk_purchase_line | reference_id | 0 bytes | stk_purchase_line_reference_id_fkey | stk_reference
stk_order | freighter_id | 0 bytes | stk_order_freighter_id_fkey | stk_freighter
stk_order_line | product_id | 0 bytes | stk_order_line_product_id_fkey | cnt_product
stk_order_fulfillment | freighter_id | 0 bytes | stk_order_fulfillment_freighter_id_fkey | stk_freighter
stk_part | sibling_id | 0 bytes | stk_part_sibling_id_fkey | stk_part
stk_order_part | part_id | 0 bytes | stk_order_part_part_id_fkey | stk_part
For those who decided to systematically create and index on every referencing column, this other version could be more efficient:
--
-- function: missing_fk_indexes2
-- purpose: List all foreing keys in the database without and index in the referencing table.
-- The listing contains create index sentences
-- author: Based on the work of Laurenz Albe
-- see: https://www.cybertec-postgresql.com/en/index-your-foreign-key/
--
create or replace function missing_fk_indexes2 ()
returns setof varchar
language sql as $$
select
-- create index sentence
'create index on ' ||
tc.conrelid::regclass ||
'(' ||
string_agg(ta.attname, ', ' order by tx.n) ||
')' as create_index
from pg_catalog.pg_constraint tc
-- enumerated key column numbers per foreign key
cross join lateral unnest(tc.conkey) with ordinality as tx(attnum, n)
-- name for each key column
join pg_catalog.pg_attribute ta on ta.attnum = tx.attnum and ta.attrelid = tc.conrelid
where not exists (
-- is there ta matching index for the constraint?
select 1 from pg_catalog.pg_index i
where
i.indrelid = tc.conrelid and
-- the first index columns must be the same as the key columns, but order doesn't matter
(i.indkey::smallint[])[0:cardinality(tc.conkey)-1] #> tc.conkey) and
tc.contype = 'f'
group by
tc.conrelid,
tc.conname,
tc.confrelid
order by
pg_catalog.pg_relation_size(tc.conrelid) desc
$$;
Now the output is the create index sentence you have to add to your database.
select * from missing_fk_indexes2();
missing_fk_indexes2
----------------------------------------------------------
create index on stk_warehouse(supplier_id)
create index on stk_reference(supplier_id)
create index on stk_part_reference(reference_id)
create index on stk_warehouse_part(part_id)
create index on stk_warehouse_part_log(dst_warehouse_id)
create index on stk_warehouse_part_log(part_id)
create index on stk_warehouse_part_log(src_warehouse_id)
create index on stk_product_part(part_id)
create index on stk_purchase(parent_id)
create index on stk_purchase(supplier_id)
create index on stk_purchase_line(reference_id)
create index on stk_order(freighter_id)
create index on stk_order_line(product_id)
create index on stk_order_fulfillment(freighter_id)
create index on stk_part(sibling_id)
create index on stk_order_part(part_id)

For a PRIMARY KEY, an index will be created with the following message:
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "index" for table "table"
For a FOREIGN KEY, the constraint will not be created if there is no index on the referenced table.
An index on referencing table is not required (though desired), and therefore will not be implicitly created.

And here's a bash script that generates the SQL to create indexes for missing indexes on foreign keys using #sergeyB's SQL.
#!/bin/bash
read -r -d '' SQL <<EOM
WITH fk_actions ( code, action ) AS (
VALUES ( 'a', 'error' ),
( 'r', 'restrict' ),
( 'c', 'cascade' ),
( 'n', 'set null' ),
( 'd', 'set default' )
),
fk_list AS (
SELECT pg_constraint.oid as fkoid, conrelid, confrelid as parentid,
conname, relname, nspname,
fk_actions_update.action as update_action,
fk_actions_delete.action as delete_action,
conkey as key_cols
FROM pg_constraint
JOIN pg_class ON conrelid = pg_class.oid
JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid
JOIN fk_actions AS fk_actions_update ON confupdtype = fk_actions_update.code
JOIN fk_actions AS fk_actions_delete ON confdeltype = fk_actions_delete.code
WHERE contype = 'f'
),
fk_attributes AS (
SELECT fkoid, conrelid, attname, attnum
FROM fk_list
JOIN pg_attribute
ON conrelid = attrelid
AND attnum = ANY( key_cols )
ORDER BY fkoid, attnum
),
fk_cols_list AS (
SELECT fkoid, array_to_string(array_agg(attname), ':') as cols_list
FROM fk_attributes
GROUP BY fkoid
),
index_list AS (
SELECT indexrelid as indexid,
pg_class.relname as indexname,
indrelid,
indkey,
indpred is not null as has_predicate,
pg_get_indexdef(indexrelid) as indexdef
FROM pg_index
JOIN pg_class ON indexrelid = pg_class.oid
WHERE indisvalid
),
fk_index_match AS (
SELECT fk_list.*,
indexid,
indexname,
indkey::int[] as indexatts,
has_predicate,
indexdef,
array_length(key_cols, 1) as fk_colcount,
array_length(indkey,1) as index_colcount,
round(pg_relation_size(conrelid)/(1024^2)::numeric) as table_mb,
cols_list
FROM fk_list
JOIN fk_cols_list USING (fkoid)
LEFT OUTER JOIN index_list
ON conrelid = indrelid
AND (indkey::int2[])[0:(array_length(key_cols,1) -1)] #> key_cols
),
fk_perfect_match AS (
SELECT fkoid
FROM fk_index_match
WHERE (index_colcount - 1) <= fk_colcount
AND NOT has_predicate
AND indexdef LIKE '%USING btree%'
),
fk_index_check AS (
SELECT 'no index' as issue, *, 1 as issue_sort
FROM fk_index_match
WHERE indexid IS NULL
UNION ALL
SELECT 'questionable index' as issue, *, 2
FROM fk_index_match
WHERE indexid IS NOT NULL
AND fkoid NOT IN (
SELECT fkoid
FROM fk_perfect_match)
),
parent_table_stats AS (
SELECT fkoid, tabstats.relname as parent_name,
(n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as parent_writes,
round(pg_relation_size(parentid)/(1024^2)::numeric) as parent_mb
FROM pg_stat_user_tables AS tabstats
JOIN fk_list
ON relid = parentid
),
fk_table_stats AS (
SELECT fkoid,
(n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as writes,
seq_scan as table_scans
FROM pg_stat_user_tables AS tabstats
JOIN fk_list
ON relid = conrelid
)
SELECT relname as table_name,
cols_list
FROM fk_index_check
JOIN parent_table_stats USING (fkoid)
JOIN fk_table_stats USING (fkoid)
ORDER BY issue_sort, table_mb DESC, table_name;
EOM
DB_NAME="dbname"
DB_USER="dbuser"
DB_PASSWORD="dbpass"
DB_HOSTNAME="hostname"
DB_PORT=5432
export PGPASSWORD="$DB_PASSWORD"
psql -h $DB_HOSTNAME -p $DB_PORT -U $DB_USER -d $DB_NAME -t -A -F"," -c "$SQL" | while read -r line; do
IFS=','
parts=($line)
unset IFS
tableName=${parts[0]}
colsList=${parts[1]}
indexName="${tableName}_${colsList//:/_}_index"
printf -- "\n--Index: %s\nDROP INDEX IF EXISTS %s;\n
CREATE INDEX %s\n\t\tON %s USING btree\n\t(%s);
" "$indexName" "$indexName" "$indexName" "$tableName" "$colsList"
done

Related

Partial or full index in Oracle?

I'm working on a library that retrieves database metadata for multiple databases, and I wanted to find if an index is partial or full in Oracle.
According to Oracle docs, an index is partial if all its columns can be null. In that case not all rows will be included in the index. I looked at the table all_indexes but couldn't find details if the index is full or partial.
For example:
create table t (
a int not null,
b int,
c int not null
);
Is there any way of determining if the following indexes are partial or full?
create index ix1 on t (a); -- full index
create index ix2 on t (b); -- partial index
create index ix3 on t (a + c, a * c); -- full index
create index ix4 on t (a * b); -- partial index
-- Now, an unlisted table constraint (unique + partial index)
create unique index ix5 on t (case when b = 1 then a end);
We can see the unlisted constraint at work when we try to insert:
insert into t (a, b, c) values (123, 1, 5); -- succeeds
insert into t (a, b, c) values (123, 1, 6); -- fails as expected!
Determining the index is partial is crucial to find unlisted table constraints, such as ix5 above.

The information you want can be determined by querying the ALL_INDEXES, ALL_IND_COLUMNS, and ALL_TAB_COLUMNS views:
WITH cteIndex_column_info AS
(
SELECT
ic.INDEX_OWNER,
ic.INDEX_NAME,
1 AS COLUMN_COUNT,
CASE
WHEN tc.NULLABLE = 'Y' THEN 1
ELSE 0
END AS NULLABLE_COLUMN_COUNT
FROM
ALL_INDEXES i
INNER JOIN
ALL_IND_COLUMNS ic ON ic.INDEX_OWNER = i.OWNER
AND ic.INDEX_NAME = i.INDEX_NAME
INNER JOIN
ALL_TAB_COLUMNS tc ON tc.OWNER = i.TABLE_OWNER
AND tc.TABLE_NAME = i.TABLE_NAME
AND tc.COLUMN_NAME = ic.COLUMN_NAME
)
SELECT
INDEX_OWNER,
INDEX_NAME,
CASE
WHEN SUM(COLUMN_COUNT) = SUM(NULLABLE_COLUMN_COUNT) THEN 'PARTIAL'
ELSE 'FULL'
END AS INDEX_TYPE
FROM
cteIndex_column_info
GROUP BY
INDEX_OWNER, INDEX_NAME
ORDER BY
INDEX_OWNER, INDEX_NAME
db<>fiddle here

Bad attname value on pg_attribute

i have 2 postgres DB's with the same table X, and primary key PKEY (i'm not creator of this).
When im looking on it using my client (i try 2 different) or extracting ddl, i got identical source like this:
...
CONSTRAINT pkey PRIMARY KEY(column1, column2)
...
The problem is what i see in pg_attribute.attname - first DB have correct values (column1 and column2) but the second have column1 and id (?). The rest of data (attnum and other parameters) are identical...It's interesting that column id not exists on this table (X)...maybe one day it existed, but i'm not sure how to check it).
This is a production environment, so recreating index etc it's not easy...Have you met with a similar situation?
comparison method:
select cls.oid,
nsp.nspname as object_schema,
cls.relname as object_name,
a.attname,
case cls.relkind
when 'r' then 'TABLE'
when 'm' then 'MATERIALIZED_VIEW'
when 'i' then 'INDEX'
when 'S' then 'SEQUENCE'
when 'v' then 'VIEW'
when 'c' then 'TYPE'
else cls.relkind::text
end as type
from pg_class cls
join pg_roles rol on rol.oid = cls.relowner
join pg_namespace nsp on nsp.oid = cls.relnamespace
join pg_catalog.pg_attribute a on a.attrelid = cls.oid
left outer join pg_attrdef ad on (ad.adrelid = cls.oid and ad.adnum = a.attnum)
left outer join pg_constraint con on cls.oid = con.conrelid
where nsp.nspname not in ('information_schema', 'pg_catalog')
and nsp.nspname not like 'pg_toast%'
and a.attnum > 0
AND NOT a.attisdropped
and cls.relname like '%my_pkey_real_name%'
order by 1, 2
This query returns 'column1' and 'column2' in attname column on DB1 and 'column1' and 'id' on DB2.
As i wrote - the problem is that the column 'id' don't exists...when i'm extract ddl i'm getting somthing like:
ALTER TABLE X
ADD CONSTRAINT pkey
PRIMARY KEY (column1, column2) NOT DEFERRABLE;
on both DB's

So, i know the reason but not a solution :(
Let's create a test table:
CREATE TABLE test.test_x (
id VARCHAR(128) NOT NULL,
CONSTRAINT pkey PRIMARY KEY(id)
)
WITH (oids = false);
then query pg_class and pg_attribute:
select c.oid, c.* from pg_class c where relname = 'pkey'
select * from pg_attribute where attrelid = 854514857
the result of second query is:
next reneme the primary key column:
ALTER TABLE test.test_x RENAME COLUMN id TO id_x;
the name of this column have changed, ddl is ok, value in pg_class (queried for the table context) is ok (id_x, not id), but in pg_attribute is still the older one:
i tried reindex this table (or only index) but it didn't help

Query to get parent table using table child table

I'm looking for query to get the parent table details(name) using child table name and child table schema.
I browsed over the web but didn't get any query.
CREATE TABLE smt.items (
item_code INTEGER PRIMARY KEY DEFAULT '1001'
,item_name CHARACTER(35) NOT NULL
,purchase_unit CHARACTER(10)
,sale_unit CHARACTER(10)
,purchase_price NUMERIC(10, 2)
,sale_price NUMERIC(10, 2)
);
CREATE TABLE smt.sub_items (
sub_item_id INTEGER PRIMARY KEY
,sub_items_name CHARACTER(35) NOT NULL
) inherits (smt.items);

Something like this:
select bt.relname as table_name, bns.nspname as table_schema
from pg_class ct
join pg_namespace cns on ct.relnamespace = cns.oid
join pg_inherits i on i.inhrelid = ct.oid
join pg_class bt on i.inhparent = bt.oid
join pg_namespace bns on bt.relnamespace = bns.oid
where bt.relkind <> 'p'
and cns.nspname = 'public'
and ct.relname = 'child_table_name';

How can I find tables which reference a particular row via a foreign key?

Given a structure like this:
CREATE TABLE reference_table (
reference_table_key numeric NOT NULL,
reference_value numeric,
CONSTRAINT reference_table_pk PRIMARY KEY (reference_table_key)
);
CREATE TABLE other_table (
other_table_key numeric NOT NULL,
reference_table_key numeric,
CONSTRAINT other_table_pk PRIMARY KEY (other_table_key),
ONSTRAINT other_table_reference_fk FOREIGN KEY (reference_table_key)
REFERENCES reference_table (reference_table_key) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE SET NULL
);
CREATE TABLE another_table (
another_table_key numeric NOT NULL,
do_stuff_key numeric,
CONSTRAINT another_table_pk PRIMARY KEY (another_table_key),
ONSTRAINT another_table_reference_fk FOREIGN KEY (do_stuff_key)
REFERENCES reference_table (reference_table_key) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE SET NULL
);
--there are 50-60 tables which have similar foreign key references to reference_table
I want to write a query that tells me the primary keys in other_table and another_table and potentially more tables where reference_value is NULL.
In psuedo-code:
SELECT table_name, table_primary_key, table_fk_column_name
FROM ?????? some PG table ???????, reference_table
WHERE reference_table.reference_value IS NULL;
The result would look something like:
table_name | table_primary_key | table_fk_column_name | reference_table_pk
---------------------------------------------------------------------------
other_table | 2 | reference_table_key | 7
other_table | 4 | reference_table_key | 56
other_table | 45 | reference_table_key | 454
other_table | 65765 | reference_table_key | 987987
other_table | 11 | reference_table_key | 3213
another_table | 3 | do_stuff_key | 4645
another_table | 5 | do_stuff_key | 43546
another_table | 7 | do_stuff_key | 464356
unknown_table | 1 | unkown_column_key | 435435
unknown_table | 1 | some_other_column_key | 34543
unknown_table | 3 | unkown_column_key | 124
unknown_table | 3 | some_other_column_key | 123
This is similar to, but not a duplicate of Postgres: SQL to list table foreign keys . That question shows the table structure. I want to find specific instances.
Essentially if I were to DELETE FROM reference_table WHERE reference_value IS NULL;, postgres has to do something internally to figure out that it needs to set reference_table_key in row 2 in other_table to NULL. I want to see what those rows would be.
Is there a query that can do this? Is there a modifier that I can pass to a DELETE call that would tell me what tables/rows/columns would be affected by that DELETE?

NULL values in referencing columns
This query produces the DML statement to find all rows in all tables, where a column has a foreign-key constraint referencing another table but hold a NULL value in that column:
WITH x AS (
SELECT c.conrelid::regclass AS tbl
, c.confrelid::regclass AS ftbl
, quote_ident(k.attname) AS fk
, quote_ident(pf.attname) AS pk
FROM pg_constraint c
JOIN pg_attribute k ON (k.attrelid, k.attnum) = (c.conrelid, c.conkey[1])
JOIN pg_attribute f ON (f.attrelid, f.attnum) = (c.confrelid, c.confkey[1])
LEFT JOIN pg_constraint p ON p.conrelid = c.conrelid AND p.contype = 'p'
LEFT JOIN pg_attribute pf ON (pf.attrelid, pf.attnum)
= (p.conrelid, p.conkey[1])
WHERE c.contype = 'f'
AND c.confrelid = 'fk_tbl'::regclass -- references to this tbl
AND f.attname = 'fk_tbl_id' -- and only to this column
)
SELECT string_agg(format(
'SELECT %L AS tbl
, %L AS pk
, %s::text AS pk_val
, %L AS fk
, %L AS ftbl
FROM %1$s WHERE %4$s IS NULL'
, tbl
, COALESCE(pk 'NONE')
, COALESCE(pk 'NULL')
, fk
, ftbl), '
UNION ALL
') || ';'
FROM x;
Produces a query like this:
SELECT 'some_tbl' AS tbl
, 'some_tbl_id' AS pk
, some_tbl_id::text AS pk_val
, 'fk_tbl_id' AS fk
, 'fk_tbl' AS ftbl
FROM some_tbl WHERE fk_tbl_id IS NULL
UNION ALL
SELECT 'other_tbl' AS tbl
, 'other_tbl_id' AS pk
, other_tbl_id::text AS pk_val
, 'some_name_id' AS fk
, 'fk_tbl' AS ftbl
FROM other_tbl WHERE some_name_id IS NULL;
Produces output like this:
tbl | pk | pk_val | fk | ftbl
-----------+--------------+--------+--------------+--------
some_tbl | some_tbl_id | 49 | fk_tbl_id | fk_tbl
some_tbl | some_tbl_id | 58 | fk_tbl_id | fk_tbl
other_tbl | other_tbl_id | 66 | some_name_id | fk_tbl
other_tbl | other_tbl_id | 67 | some_name_id | fk_tbl
Does not cover multi-column foreign or primary keys reliably. You have to make the query more complex for this.
I cast all primary key values to text to cover all types.
Adapt or remove these lines to find foreign key pointing to an other or any column / table:
AND c.confrelid = 'fk_tbl'::regclass
AND f.attname = 'fk_tbl_id' -- and only this column
Tested with PostgreSQL 9.1.4. I use the pg_catalog tables. Realistically nothing of what I use here is going to change, but that is not guaranteed across major releases. Rewrite it with tables from information_schema if you need it to work reliably across updates. That is slower, but sure.
I did not sanitize table names in the generated DML script, because quote_ident() would fail with schema-qualified names. It is your responsibility to avoid harmful table names like "users; DELETE * FROM users;". With some more effort, you can retrieve schema-name and table name separately and use quote_ident().
NULL values in referenced columns
My first solution does something subtly different from what you ask, because what you describe (as I understand it) is non-existent. The value NULL is "unknown" and cannot be referenced. If you actually want to find rows with a NULL value in a column that has FK constraints pointing to it (not to the particular row with the NULL value, of course), then the query can be much simplified:
WITH x AS (
SELECT c.confrelid::regclass AS ftbl
,quote_ident(f.attname) AS fk
,quote_ident(pf.attname) AS pk
,string_agg(c.conrelid::regclass::text, ', ') AS referencing_tbls
FROM pg_constraint c
JOIN pg_attribute f ON (f.attrelid, f.attnum) = (c.confrelid, c.confkey[1])
LEFT JOIN pg_constraint p ON p.conrelid = c.confrelid AND p.contype = 'p'
LEFT JOIN pg_attribute pf ON (pf.attrelid, pf.attnum)
= (p.conrelid, p.conkey[1])
WHERE c.contype = 'f'
-- AND c.confrelid = 'fk_tbl'::regclass -- only referring this tbl
GROUP BY 1, 2, 3
)
SELECT string_agg(format(
'SELECT %L AS ftbl
, %L AS pk
, %s::text AS pk_val
, %L AS fk
, %L AS referencing_tbls
FROM %1$s WHERE %4$s IS NULL'
, ftbl
, COALESCE(pk, 'NONE')
, COALESCE(pk, 'NULL')
, fk
, referencing_tbls), '
UNION ALL
') || ';'
FROM x;
Finds all such rows in the entire database (commented out the restriction to one table). Tested with Postgres 9.1.4 and works for me.
I group multiple tables referencing the same foreign column into one query and add a list of referencing tables to give an overview.

You want a union for this query:
select *
from ((select 'other_table' as table_name,
other_table_key as primary_key,
'reference_table_key' as table_fk,
ot.reference_table_key
from other_table ot left outer join
reference_table rt
on ot.reference_table_key = rt.reference_table_key
where rt.reference_value is null
) union all
(select 'another_table' as table_name,
another_table_key as primary_key,
'do_stuff_key' as table_fk,
at.do_stuff_key
from another_table at left outer join
reference_table rt
on at.do_stuff_key = rt.reference_table_key
where rt.reference_value is null
)
) t

PostgreSQL: SQL script to get a list of all tables that have a particular column as foreign key

I'm using PostgreSQL and I'm trying to list all the tables that have a particular column from a table as a foreign-key/reference. Can this be done? I'm sure this information is stored somewhere in information_schema but I have no idea how to start querying it.

SELECT
r.table_name
FROM information_schema.constraint_column_usage u
INNER JOIN information_schema.referential_constraints fk
ON u.constraint_catalog = fk.unique_constraint_catalog
AND u.constraint_schema = fk.unique_constraint_schema
AND u.constraint_name = fk.unique_constraint_name
INNER JOIN information_schema.key_column_usage r
ON r.constraint_catalog = fk.constraint_catalog
AND r.constraint_schema = fk.constraint_schema
AND r.constraint_name = fk.constraint_name
WHERE
u.column_name = 'id' AND
u.table_catalog = 'db_name' AND
u.table_schema = 'public' AND
u.table_name = 'table_a'
This uses the full catalog/schema/name triplet to identify a db table from all 3 information_schema views. You can drop one or two as required.
The query lists all tables that have a foreign key constraint against the column 'a' in table 'd'

The other solutions are not guaranteed to work in postgresql, as the constraint_name is not guaranteed to be unique; thus you will get false positives. PostgreSQL used to name constraints silly things like '$1', and if you've got an old database you've been maintaining through upgrades, you likely still have some of those around.
Since this question was targeted AT PostgreSQL and that is what you are using, then you can query the internal postgres tables pg_class and pg_attribute to get a more accurate result.
NOTE: FKs can be on multiple columns, thus the referencing column (attnum of pg_attribute) is an ARRAY, which is the reason for using array_agg in the answer.
The only thing you need plug in is the TARGET_TABLE_NAME:
select
(select r.relname from pg_class r where r.oid = c.conrelid) as table,
(select array_agg(attname) from pg_attribute
where attrelid = c.conrelid and ARRAY[attnum] <# c.conkey) as col,
(select r.relname from pg_class r where r.oid = c.confrelid) as ftable
from pg_constraint c
where c.confrelid = (select oid from pg_class where relname = 'TARGET_TABLE_NAME');
If you want to go the other way (list all of the things a specific table refers to), then just change the last line to:
where c.conrelid = (select oid from pg_class where relname = 'TARGET_TABLE_NAME');
Oh, and since the actual question was to target a specific column, you can specify the column name with this one:
select (select r.relname from pg_class r where r.oid = c.conrelid) as table,
(select array_agg(attname) from pg_attribute
where attrelid = c.conrelid and ARRAY[attnum] <# c.conkey) as col,
(select r.relname from pg_class r where r.oid = c.confrelid) as ftable
from pg_constraint c
where c.confrelid = (select oid from pg_class where relname = 'TARGET_TABLE_NAME') and
c.confkey #> (select array_agg(attnum) from pg_attribute
where attname = 'TARGET_COLUMN_NAME' and attrelid = c.confrelid);

This query requires only the referenced table name and column name, and produces a result set containing both sides of the foreign key.
select confrelid::regclass, af.attname as fcol,
conrelid::regclass, a.attname as col
from pg_attribute af, pg_attribute a,
(select conrelid,confrelid,conkey[i] as conkey, confkey[i] as confkey
from (select conrelid,confrelid,conkey,confkey,
generate_series(1,array_upper(conkey,1)) as i
from pg_constraint where contype = 'f') ss) ss2
where af.attnum = confkey and af.attrelid = confrelid and
a.attnum = conkey and a.attrelid = conrelid
AND confrelid::regclass = 'my_table'::regclass AND af.attname = 'my_referenced_column';
Example result set:
confrelid | fcol | conrelid | col
----------+----------------------+---------------+-------------
my_table | my_referenced_column | some_relation | source_type
my_table | my_referenced_column | some_feature | source_type
All credit to Lane and Krogh at the PostgreSQL forum.

Personally, I prefer to query based on the referenced unique constraint rather than the column. That would look something like this:
SELECT rc.constraint_catalog,
rc.constraint_schema||'.'||tc.table_name AS table_name,
kcu.column_name,
match_option,
update_rule,
delete_rule
FROM information_schema.referential_constraints AS rc
JOIN information_schema.table_constraints AS tc USING(constraint_catalog,constraint_schema,constraint_name)
JOIN information_schema.key_column_usage AS kcu USING(constraint_catalog,constraint_schema,constraint_name)
WHERE unique_constraint_catalog='catalog'
AND unique_constraint_schema='schema'
AND unique_constraint_name='constraint name';
Here is a version that allows querying by column name:
SELECT rc.constraint_catalog,
rc.constraint_schema||'.'||tc.table_name AS table_name,
kcu.column_name,
match_option,
update_rule,
delete_rule
FROM information_schema.referential_constraints AS rc
JOIN information_schema.table_constraints AS tc USING(constraint_catalog,constraint_schema,constraint_name)
JOIN information_schema.key_column_usage AS kcu USING(constraint_catalog,constraint_schema,constraint_name)
JOIN information_schema.key_column_usage AS ccu ON(ccu.constraint_catalog=rc.unique_constraint_catalog AND ccu.constraint_schema=rc.unique_constraint_schema AND ccu.constraint_name=rc.unique_constraint_name)
WHERE ccu.table_catalog='catalog'
AND ccu.table_schema='schema'
AND ccu.table_name='name'
AND ccu.column_name='column';

SELECT
main_table.table_name AS main_table_table_name,
main_table.column_name AS main_table_column_name,
main_table.constraint_name AS main_table_constraint_name,
info_other_table.table_name AS info_other_table_table_name,
info_other_table.constraint_name AS info_other_table_constraint_name,
info_other_table.column_name AS info_other_table_column_name
FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE main_table
INNER JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS other_table
ON other_table.unique_constraint_name = main_table.constraint_name
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE info_other_table
ON info_other_table.constraint_name = other_table.constraint_name
WHERE main_table.table_name = 'MAIN_TABLE_NAME';

A simple request for recovered the names of foreign key as well as the names of the tables:
SELECT CONSTRAINT_NAME, table_name
FROM
information_schema.table_constraints
WHERE table_schema='public' and constraint_type='FOREIGN KEY'

If you use the psql client, you can simply issue the \d table_name command to see which tables reference the given table. From the linked documentation page:
\d[S+] [ pattern ]
For each relation (table, view, materialized view, index, sequence, or foreign table) or composite type matching the pattern,
show all columns, their types, the tablespace (if not the default) and
any special attributes such as NOT NULL or defaults. Associated
indexes, constraints, rules, and triggers are also shown. For foreign
tables, the associated foreign server is shown as well.

Table constraints can include multiple columns. The trick to getting this right is to join each column by their constraint ordinal positions. If you don't join correctly your script will blow up with duplicate rows 😥 whenever a table has multiple columns in a unique constraint.
Query
Lists all foreign key columns and their references.
select
-- unique reference info
ref.table_catalog as ref_database,
ref.table_schema as ref_schema,
ref.table_name as ref_table,
ref.column_name as ref_column,
refd.constraint_type as ref_type, -- e.g. UNIQUE or PRIMARY KEY
-- foreign key info
fk.table_catalog as fk_database,
fk.table_schema as fk_schema,
fk.table_name as fk_table,
fk.column_name as fk_column,
map.update_rule as fk_on_update,
map.delete_rule as fk_on_delete
-- lists fk constraints and maps them to pk constraints
from information_schema.referential_constraints as map
-- join unique constraints (e.g. PKs constraints) to ref columns info
inner join information_schema.key_column_usage as ref
on ref.constraint_catalog = map.unique_constraint_catalog
and ref.constraint_schema = map.unique_constraint_schema
and ref.constraint_name = map.unique_constraint_name
-- optional: to include reference constraint type
left join information_schema.table_constraints as refd
on refd.constraint_catalog = ref.constraint_catalog
and refd.constraint_schema = ref.constraint_schema
and refd.constraint_name = ref.constraint_name
-- join fk columns to the correct ref columns using ordinal positions
inner join information_schema.key_column_usage as fk
on fk.constraint_catalog = map.constraint_catalog
and fk.constraint_schema = map.constraint_schema
and fk.constraint_name = map.constraint_name
and fk.position_in_unique_constraint = ref.ordinal_position --IMPORTANT!
Helpful links
information_schema.table_constraints
information_schema.referential_constraints
information_schema.key_column_usage
Explanation
consider the relationship between these to tables.
create table foo (
a int,
b int,
primary key (a,b)
);
create table bar (
c int,
d int,
foreign key (c,d) references foo (b,a) -- i flipped a,b to make a point later.
);
get table constraint names
select * from information_schema.table_constraints where table_name in ('foo','bar');
| constraint_name | table_name | constraint_type |
| --------------- | ---------- | --------------- |
| foo_pkey | foo | PRIMARY KEY |
| bar_c_d_fkey | bar | FOREIGN KEY |
constraint references
select * from information_schema.referential_constraints where constraint_name in ('bar_c_d_fkey');
| constraint_name | unique_constraint_name |
| --------------- | ---------------------- |
| bar_c_d_fkey | foo_pkey |
constraint ordinal_position of column.
select * from information_schema.key_column_usage where table_name in ('foo','bar');
| constraint_name | table_name | column_name | ordinal_position | position_in_unique_constraint |
| --------------- | ---------- | ----------- | ---------------- | ----------------------------- |
| foo_pkey | foo | a | 1 | null |
| foo_pkey | foo | b | 2 | null |
| bar_c_d_fkey | bar | c | 1 | 2 |
| bar_c_d_fkey | bar | d | 2 | 1 |
Now all that's left is to join them together. The main query above is one way you could do so.

I turned #Tony K's answer into a reusable function that takes in a schema/table/column tuple and returns all tables that have a foreign key relationship: https://gist.github.com/colophonemes/53b08d26bdd219e6fc11677709e8fc6c
I needed something like this in order to implement a script that merged two records into a single record.
Function:
CREATE SCHEMA utils;
-- Return type for the utils.get_referenced_tables function
CREATE TYPE utils.referenced_table_t AS (
constraint_name name,
schema_name name,
table_name name,
column_name name[],
foreign_schema_name name,
foreign_table_name name
);
/*
A function to get all downstream tables that are referenced to a table via a foreign key relationship
The function looks at all constraints that contain a reference to the provided schema-qualified table column
It then generates a list of the schema/table/column tuples that are the target of these references
Idea based on https://stackoverflow.com/a/21125640/7114675
Postgres built-in reference:
- pg_namespace => schemas
- pg_class => tables
- pg_attribute => table columns
- pg_constraint => constraints
*/
CREATE FUNCTION utils.get_referenced_tables (schema_name name, table_name name, column_name name)
RETURNS SETOF utils.referenced_table_t AS $$
-- Wrap the internal query in a select so that we can order it more easily
SELECT * FROM (
-- Get human-readable names for table properties by mapping the OID's stored on the pg_constraint
-- table to the underlying value on their relevant table.
SELECT
-- constraint name - we get this directly from the constraints table
pg_constraint.conname AS constraint_name,
-- schema_name
(
SELECT pg_namespace.nspname FROM pg_namespace
WHERE pg_namespace.oid = pg_constraint.connamespace
) as schema_name,
-- table_name
(
SELECT pg_class.relname FROM pg_class
WHERE pg_class.oid = pg_constraint.conrelid
) as table_name,
-- column_name
(
SELECT array_agg(attname) FROM pg_attribute
WHERE attrelid = pg_constraint.conrelid
AND ARRAY[attnum] <# pg_constraint.conkey
) AS column_name,
-- foreign_schema_name
(
SELECT pg_namespace.nspname FROM pg_namespace
WHERE pg_namespace.oid = (
SELECT pg_class.relnamespace FROM pg_class
WHERE pg_class.oid = pg_constraint.confrelid
)
) AS foreign_schema_name,
-- foreign_table_name
(
SELECT pg_class.relname FROM pg_class
WHERE pg_class.oid = pg_constraint.confrelid
) AS foreign_table_name
FROM pg_constraint
-- confrelid = constraint foreign relation id = target schema + table
WHERE confrelid IN (
SELECT oid FROM pg_class
-- relname = target table name
WHERE relname = get_referenced_tables.table_name
-- relnamespace = target schema
AND relnamespace = (
SELECT oid FROM pg_namespace
WHERE nspname = get_referenced_tables.schema_name
)
)
-- confkey = constraint foreign key = the column on the foreign table linked to the target column
AND confkey #> (
SELECT array_agg(attnum) FROM pg_attribute
WHERE attname = get_referenced_tables.column_name
AND attrelid = pg_constraint.confrelid
)
) a
ORDER BY
schema_name,
table_name,
column_name,
foreign_table_name,
foreign_schema_name
;
$$ LANGUAGE SQL STABLE;
Example usage:
/*
Function to merge two people into a single person
The primary person (referenced by primary_person_id) will be retained, the secondary person
will have all their records re-referenced to the primary person, and then the secondary person
will be deleted
Note that this function may be destructive! For most tables, the records will simply be merged,
but in cases where merging would violate a UNIQUE or EXCLUSION constraint, the secondary person's
respective records will be dropped. For example, people cannot have overlapping pledges (on the
pledges.pledge table). If the secondary person has a pledge that overlaps with a pledge that is
on record for the primary person, the secondary person's pledge will just be deleted.
*/
CREATE FUNCTION utils.merge_person (primary_person_id BIGINT, secondary_person_id BIGINT)
RETURNS people.person AS $$
DECLARE
_referenced_table utils.referenced_table_t;
_col name;
_exec TEXT;
_primary_person people.person;
BEGIN
-- defer all deferrable constraints
SET CONSTRAINTS ALL DEFERRED;
-- This loop updates / deletes all referenced tables, setting the person_id (or equivalent)
-- From secondary_person_id => primary_person_id
FOR _referenced_table IN (SELECT * FROM utils.get_referenced_tables('people', 'person', 'id')) LOOP
-- the column_names are stored as an array, so we need to loop through these too
FOREACH _col IN ARRAY _referenced_table.column_name LOOP
RAISE NOTICE 'Merging %.%(%)', _referenced_table.schema_name, _referenced_table.table_name, _col;
-- FORMAT allows us to safely build a dynamic SQL string
_exec = FORMAT(
$sql$ UPDATE %s.%s SET %s = $1 WHERE %s = $2 $sql$,
_referenced_table.schema_name,
_referenced_table.table_name,
_col,
_col
);
RAISE NOTICE 'SQL: %', _exec;
-- wrap the execution in a block so that we can handle uniqueness violations
BEGIN
EXECUTE _exec USING primary_person_id, secondary_person_id;
RAISE NOTICE 'Merged %.%(%) OK!', _referenced_table.schema_name, _referenced_table.table_name, _col;
EXCEPTION
-- Error codes are Postgres built-ins, see https://www.postgresql.org/docs/9.6/errcodes-appendix.html
WHEN unique_violation OR exclusion_violation THEN
RAISE NOTICE 'Cannot merge record with % = % on table %.%, falling back to deletion!', _col, secondary_person_id, _referenced_table.schema_name, _referenced_table.table_name;
_exec = FORMAT(
$sql$ DELETE FROM %s.%s WHERE %s = $1 $sql$,
_referenced_table.schema_name,
_referenced_table.table_name,
_col
);
RAISE NOTICE 'SQL: %', _exec;
EXECUTE _exec USING secondary_person_id;
RAISE WARNING 'Deleted record with % = % on table %.%', _col, secondary_person_id, _referenced_table.schema_name, _referenced_table.table_name;
END;
END LOOP;
END LOOP;
-- Once we've updated all the tables, we can safely delete the secondary person
RAISE WARNING 'Deleted person with id = %', secondary_person_id;
-- Get our primary person so that we can return them
SELECT * FROM people.person WHERE id = primary_person_id INTO _primary_person;
RETURN _primary_person;
END
$$ LANGUAGE plpgsql VOLATILE;
Note the use of SET CONSTRAINTS ALL DEFERRED; in the function, which ensures that foreign key relationships are checked at the end of the merge. You may need to update your constraints to be DEFERRABLE INITIALLY DEFERRED:
ALTER TABLE settings.contact_preference
DROP CONSTRAINT contact_preference_person_id_fkey,
DROP CONSTRAINT person_id_current_address_id_fkey,
ADD CONSTRAINT contact_preference_person_id_fkey
FOREIGN KEY (person_id)
REFERENCES people.person(id)
ON UPDATE CASCADE ON DELETE CASCADE
DEFERRABLE INITIALLY IMMEDIATE,
ADD CONSTRAINT person_id_current_address_id_fkey
FOREIGN KEY (person_id, current_address_id)
REFERENCES people.address(person_id, id)
DEFERRABLE INITIALLY IMMEDIATE
;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas