Transform database columns to rows and sum - sql

I'm building a system where I have to find the combined price of a computer system by using the database data. The first screenshot is a build from the system table.
Systems Table
Parts Table
The different kinds are: motherboard, case, ram, cpu, graphic.
What I need is some way of turning the columns into rows and thereby summing the prices of each system.
Here is the table and content.
CREATE TABLE Component (
nome VARCHAR(30),
kind VARCHAR(10), /*cpu, ram, mainboard, cases*/
price INT,
PRIMARY KEY(nome)
);
CREATE TABLE Computer_system (
nome VARCHAR(30),
ram VARCHAR(20),
cpu VARCHAR(20),
mainboard VARCHAR(20),
cases VARCHAR(20),
gfx VARCHAR(20),
PRIMARY KEY(nome)
);
INSERT INTO Computer_system VALUES('SERVER1','D31','XEON1','LGA2011_D3_E_OGFX','CASE_A',null);
INSERT INTO Computer_system VALUES('SERVER2','D43','XEON3','LGA2011_D4_E_OGFX','CASE_A',null);
INSERT INTO Computer_system VALUES('CONSUMER1','D43','I71','LGA1150_D4_ATX_OGFX','CASE_B',null);
INSERT INTO Computer_system VALUES('GAMING1', 'D51', 'FX','AM3+_D5_ATX','BLACK_PEARL', 'NVIDIA_TITAN_BLACK_X');
INSERT INTO Computer_system VALUES('BUDGETO', 'D31', 'XEON1','LGA2011_D3_ATX','CASE_B', null);

There's a neat trick for unpivot in Postgres using UNNEST( ARRAY( ...) )
This efficiently (in one pass of the table) unpivots those multiple columns of table computer_system into multiple rows of (in this case) 3 columns: "nome", "colkind" and "colnome". An example of the unpivoted data:
| nome | colkind | colnome |
|-----------|-----------|----------------------|
| BUDGETO | ram | D31 |
| BUDGETO | gfx | (null) |
| BUDGETO | cases | CASE_B |
| BUDGETO | mainboard | LGA2011_D3_ATX |
| BUDGETO | cpu | XEON1 |
Once that data is available in that format it is simple to join to the Components table, like this:
SELECT
*
FROM (
/* this "unpivots" the source data */
SELECT
nome
, unnest(array[ram, cpu, mainboard,cases,gfx]) AS colnome
, unnest(array['ram', 'cpu', 'mainboard','cases','gfx']) AS colkind
FROM Computer_system
) unpiv
INNER JOIN Components c ON unpiv.colnome = c.nome AND unpiv.colkind = c.kind
;
From here it is simple to arrive at this result:
| nome | sum_price |
|-----------|-----------|
| BUDGETO | 291 |
| GAMING1 | 515 |
| CONSUMER1 | 292 |
| SERVER1 | 285 |
| SERVER2 | 289 |
using:
SELECT
unpiv.nome, sum(c.price) sum_price
FROM (
/* this "unpivots" the source data */
SELECT
nome
, unnest(array[ram, cpu, mainboard,cases,gfx]) AS colnome
, unnest(array['ram', 'cpu', 'mainboard','cases','gfx']) AS colkind
FROM Computer_system
) unpiv
INNER JOIN Components c ON unpiv.colnome = c.nome AND unpiv.colkind = c.kind
GROUP BY
unpiv.nome
;
See this SQLfiddle demo & please take note of the execution plan
QUERY PLAN
HashAggregate (cost=487.00..488.00 rows=100 width=82)
-> Hash Join (cost=23.50..486.50 rows=100 width=82)
Hash Cond: ((((unnest(ARRAY[computer_system.ram, computer_system.cpu, computer_system.mainboard, computer_system.cases, computer_system.gfx])))::text = (c.nome)::text) AND ((unnest('{ram,cpu,mainboard,cases,gfx}'::text[])) = (c.kind)::text))
-> Seq Scan on computer_system (cost=0.00..112.00 rows=20000 width=368)
-> Hash (cost=15.40..15.40 rows=540 width=120)
-> Seq Scan on components c (cost=0.00..15.40 rows=540 width=120)

I Think you need break down your table design into 3 table, there are Component, Computer_System and Computer_component. Below are the field list:
Computer_System -> computer_id and name
Component -> nome_component, kind, price
Computer_Component -> computer_id, nome_component. With that table, you can sum the total price for each computer_id by join the Computer_System a JOIN Computer Component b ON a.computer_id = b.Computer id JOIN Component c ON b.nome_component = c.nome_component

you can do it simply with joining Computer_system table with Component for each kind like below query:
select c.nome as name,
(coalesce(ram.price,0)
+coalesce(cpu.price,0)
+coalesce(+mainboard.price,0)
+coalesce(cases.price,0)
+coalesce(gfx.price,0)) as price
from Computer_system c
left join Components ram on c.ram=ram.nome
left join Components cpu on c.cpu=cpu.nome
left join Components mainboard on c.mainboard=mainboard.nome
left join Components cases on c.cases=cases.nome
left join Components gfx on c.gfx=gfx.nome
SQLFIDDLE DEMO

This is tricky to do because your table structures aren't suited to this type of query. Also, it is not flexible in case you want more than one gfx in a build.
Here is my suggested answer:
select sum(price)
from components
where nome in (
select ram from computer_system where nome = 'GAMING1'
UNION ALL
select cpu from computer_system where nome = 'GAMING1'
UNION ALL
select mainboard from computer_system where nome = 'GAMING1'
UNION ALL
select cases from computer_system where nome = 'GAMING1'
UNION ALL
select gfx from computer_system where nome = 'GAMING1'
)
;
And here it is in a working fiddle: http://sqlfiddle.com/#!15/228d7/8
If I restructured the tables, I would make them something like in this fiddle: http://sqlfiddle.com/#!15/f4ed06/1 with and extra parts_list table:
CREATE TABLE parts_list (
system_nome VARCHAR(30),
component_kind VARCHAR(10),
component_nome VARCHAR(30),
PRIMARY KEY (system_nome, component_kind, component_nome)
);
and your query for the cost of the GAMING1 system becomes much simpler:
select sum(price)
from components as c
inner join parts_list as PL ON c.kind = pl.component_kind and c.nome = pl.component_nome
where pl.system_nome = 'GAMING1'
;

Related

Joining two joins in PostgreSQL

Background
I've needed to learn some PostgreSQL quickly and from scratch in order to do a data analysis project about car insurance. I have a locally stored PostgreSQL database of fairly decent size (around 8gb worth of data on insurance claims for vehicles like cars and motorcycles), and I've needed to JOIN and UNION ALL a couple of things in order to get the table I need for my statistical models.
The first part of what I've needed to do is this thing, a JOIN inside of a UNION ALL between two tables about car claims and motorcycle claims:
select
l.customer_combined_id,
l.claim_id,
l."Part_Cd",
l.service_date,
h.principal_problem_cd,
h.problem_code_vers_flag
from claims.auto_claims_line_items as l
JOIN claims.auto_claims_general h on l.claim_id = h.claim_id
UNION ALL
select
l.customer_combined_id,
l.claim_id,
l."Part_Cd",
l.entry_date as service_date,
NULL as principal_problem_cd,
NULL as problem_code_vers_flag
from claims.motorcycle_claims_line_items as l
This yields a table that looks like this (column names abbreviated for aesthetics):
cust_comb_id| claim_id | "Part_Cd" | svc_date | prin_prob_cd | prob_cd_vers_flg |
------------+----------+-----------+----------+--------------+------------------|
| | | | | |
As you can see, the car claims have some columns that the motorcycle claims don't have. This is fine -- I've filled those in as NULL in order to get the UNION ALL to work. Now the car claims table is nicely stacked on top of the motorcycle claims table. So far, so good.
The second part of what I've done so far is this other thing, which concerns data about car and motorcycle insurance policyholders ("customers"):
select m.customer_dob,
m.customer_id,
m.customer_gender_cd,
m.customer_zip_cd,
c.customer_combined_id
from customer."Customer" m
JOIN customer.customer_combined_crosswalk c on m.customer_id = c.customer_id
The result of which looks like this:
dob | customer_id | gender_cd | zip_cd | cust_comb_id |
----+-------------+-----------+--------+--------------|
| | | | |
The Problem
I've figured out two halves of my data manipulation, but I don't know how to put these halves together, so to speak. I want (I think) to left join these two things on cust_comb_id, but I'm not sure how to write it. I want to keep everything in the first part (the claim data) and bring in data from the second part (the policyholders / customers) when cust_comb_id matches, and give null values if it doesn't. Here's a visual of what I'm looking for:
cust_comb_id| claim_id | "Part_Cd" | svc_date | prin_prob_cd | prob_cd_vers_flg |dob | cust_id | gender_cd | zip_cd |
------------+----------+-----------+----------+--------------+------------------|----+---------+-----------+--------+
| | | | | | | | | |
What I've tried
I've tried to use subqueries to join these joins, but I keep getting errors. Edit:
Here's a concrete example of something I've tried:
select *
from
(select
l.customer_combined_id,
l.claim_id,
l."Part_Cd",
l.service_date,
h.principal_problem_cd,
h.problem_code_vers_flag
from claims.auto_claims_line_items as l
JOIN claims.auto_claims_general h on l.claim_id = h.claim_id
UNION ALL
select
l.customer_combined_id,
l.claim_id,
l."Part_Cd",
l.entry_date as service_date,
NULL as principal_problem_cd,
NULL as problem_code_vers_flag
from claims.motorcycle_claims_line_items as l) as cl
LEFT JOIN
select m.customer_dob,
m.customer_id,
m.customer_gender_cd,
m.customer_zip_cd,
c.customer_combined_id
from customer."Customer" m
JOIN customer.customer_combined_crosswalk c on m.customer_id = c.customer_id
This yields the error ERROR: syntax error at or near "select".
Any help is much appreciated.
[Note: customer_combined_id and customer_id are two different things: the combined id is unique, and made to account for when a customer switches from one insurance plan - where they have one customer_id - to another, where they're given a new one.]
So it was a syntax issue.
OP already had all needed parts:
Part I and Part II subqueries were already implemented
it was defined how to join them
The only problem was a struggle with syntax.
I suppose this form would be the most readable:
WITH PartI AS(
select
l.customer_combined_id,
l.claim_id,
l."Part_Cd",
l.service_date,
h.principal_problem_cd,
h.problem_code_vers_flag
from claims.auto_claims_line_items as l
JOIN claims.auto_claims_general h on l.claim_id = h.claim_id
UNION ALL
select
l.customer_combined_id,
l.claim_id,
l."Part_Cd",
l.entry_date as service_date,
NULL as principal_problem_cd,
NULL as problem_code_vers_flag
from claims.motorcycle_claims_line_items as l
),
PartII AS (
select customer_dob,
customer_id,
customer_gender_cd,
customer_zip_cd,
customer_combined_id
from customer."Customer" m
JOIN customer.customer_combined_crosswalk c on m.customer_id = c.customer_id
)
SELECT
*
FROM
PartI P1
LEFT JOIN PartII P2
ON P1.customer_combined_id = P2.customer_combined_id;
https://www.db-fiddle.com/f/msAtD89dn4DndMtxukkgkP/2
Alex Yu's answer is better, but I wanted to post this because a) it also works and b) shows a neat use for views in SQL.
Take the first part, and make a view of it by adding a single line of CREATE OR REPLACE VIEW before the first select:
CREATE OR REPLACE VIEW clms AS
select
l.customer_combined_id,
l.claim_id,
l."Part_Cd",
l.service_date,
h.principal_problem_cd,
h.problem_code_vers_flag
from claims.auto_claims_line_items as l
JOIN claims.auto_claims_general h on l.claim_id = h.claim_id
UNION ALL
select
l.customer_combined_id,
l.claim_id,
l."Part_Cd",
l.entry_date as service_date,
NULL as principal_problem_cd,
NULL as problem_code_vers_flag
from claims.motorcycle_claims_line_items as l
Next, do the same for the second part:
CREATE OR REPLACE VIEW cstmr AS
select m.customer_dob,
m.customer_id,
m.customer_gender_cd,
m.customer_zip_cd,
c.customer_combined_id
from customer."Customer" m
JOIN customer.customer_combined_crosswalk c on m.customer_id = c.customer_id
Finally, do a SQL 101-level simple join of the two views:
select *
from clms
join cstmr m on clms.customer_combined_id = customer_combined_id
I bumped into this answer after posting the problem and was happy to find a (somewhat) elegant solution myself.

Return multiple columns from subquery as Columns

Here is my data.
Products: <Where all list of SKU are stored>
Prod_ID | Prod_Desc | Base_Unit_ID
1 | Custom Product | 1
UOM: <Masterlist for Unit of measures>
UOM_ID | Desc
1 | Piece
2 | Box
3 | Case
UOM_Conversion: <From base unit. Multiplier is how many base unit in a To_Unit>
Prod_ID | from_Unit_ID | Multiplier | To_Unit_ID
1 | 1 | 100 | 2
1 | 1 | 400 | 3
Given This Data. How Can I display it like this?
Product | Base Unit | Pack Unit | Multiplier | Case Unit | Multiplier
Custom Product | Piece | Box | 100 | Case | 400
I have tried Left Join Lateral but sadly. What it does is return two rows.
The reason I want to do this is because Im developing a module which stores products with multiple Unit of measure(Max of 3). So I dont want to create 3 columns of Unit, Pack and Case hence the reason I created the UOM_conversion table.
I think there is a global misconception here but I will give you it after the solution for your problem. I advise you to look at it.
So firstly : what you need here is the crosstab() function.
CREATE EXTENSION tablefunc;
For the following script :
create table uom
(
uom_id int primary key
, description varchar (250)
);
create table products
(
prod_id int primary key
, prod_desc varchar(250)
, base_unit_id int references uom (uom_id)
);
create table uom_conversion
(
prod_id int references products (prod_id)
, from_unit_id int references uom (uom_id)
, multiplier int
, to_unit_id int references uom (uom_id)
);
insert into uom values (1, 'Piece'), (2, 'Box'), (3, 'Case');
insert into products values (1, 'Custom Product', 1);
insert into uom_conversion values (1,1,100,2), (1,1,400,3);
The request is :
select
p.prod_desc as "Product"
, u.description as "Base Unit"
, u2.description as "Pack Unit"
, final_res."1" as "Multiplier"
, u3.description as "Case Unit"
, final_res."2" as "Multiplier"
from crosstab(
'select
p.prod_id
, base_unit_id
, multiplier
from products p
inner join uom_conversion uc
on uc.prod_id = p.prod_id')
as final_res (prod_id int, "1" int, "2" int)
inner join crosstab('select
uc.prod_id
, u.description
, uc.to_unit_id
from uom_conversion uc
inner join uom u
on u.uom_id = uc.to_unit_id')
as final_res_2 (prod_id int, "Box" int, "Case" int)
on final_res.prod_id = final_res_2.prod_id
inner join products p
on p.prod_id = final_res.prod_id
inner join uom u
on p.base_unit_id = u.uom_id
inner join uom u2
on u2.uom_id = final_res_2."Box"
inner join uom u3
on u3.uom_id = final_res_2."Case";
This is solving your problem. BUT : How do you know the order of what is pack_unit and what is the case_unit? I think from this question a lot more will come up.

group by not grouping aggregate?

Let's say I am trying to build an opinion poll app, such that I can create a template of an opinion poll, give it multiple sections/questions, assign multiple people to different copies of a given question, create varying measures (happyness, succesfulness, greenness) and assign different questions different weights to apply to all of these measures.
Something like so:
CREATE TABLE users (
id SERIAL NOT NULL PRIMARY KEY
);
CREATE TABLE opinion_poll_templates (
id SERIAL NOT NULL PRIMARY KEY
);
CREATE TABLE opinion_poll_instances (
id SERIAL NOT NULL PRIMARY KEY,
template_id INTEGER NOT NULL REFERENCES opinion_poll_templates(id)
);
CREATE TABLE section_templates (
id SERIAL NOT NULL PRIMARY KEY,
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_templates(id)
);
CREATE TABLE section_instances (
id SERIAL NOT NULL PRIMARY KEY,
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_instances(id),
template_id INTEGER NOT NULL REFERENCES section_templates(id)
);
CREATE TABLE question_templates (
id SERIAL NOT NULL PRIMARY KEY,
section_id INTEGER NOT NULL REFERENCES section_templates(id)
);
CREATE TABLE measure_templates (
id SERIAL NOT NULL PRIMARY KEY,
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_templates(id)
);
CREATE TABLE answer_options (
id SERIAL NOT NULL PRIMARY KEY,
question_template_id INTEGER NOT NULL REFERENCES question_templates(id),
weight FLOAT8
);
CREATE TABLE question_instances (
id SERIAL NOT NULL PRIMARY KEY,
template_id INTEGER NOT NULL REFERENCES question_templates(id),
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_instances(id),
section_id INTEGER NOT NULL REFERENCES section_instances(id),
answer_option_id INTEGER NOT NULL REFERENCES answer_options(id),
contributor_id INTEGER
);
CREATE TABLE measure_instances (
id SERIAL NOT NULL PRIMARY KEY,
opinion_poll_id INTEGER NOT NULL REFERENCES opinion_poll_instances(id),
template_id INTEGER NOT NULL REFERENCES measure_templates(id),
total_score INTEGER
);
CREATE TABLE scores (
id SERIAL NOT NULL PRIMARY KEY,
question_template_id INTEGER NOT NULL REFERENCES question_templates(id),
measure_template_id INTEGER NOT NULL REFERENCES measure_templates(id),
score INTEGER NOT NULL
);
Now let's say I am interested in the per measureInstance (per measure assigned to an opinion poll) cross question, cross user average?
WITH weighted_score AS (
SELECT AVG(answer_options.weight), measure_instances.id
FROM question_instances
INNER JOIN answer_options ON question_instances.template_id = answer_options.question_template_id
INNER JOIN scores ON question_instances.template_id = scores.question_template_id
INNER JOIN measure_instances ON measure_instances.template_id=scores.measure_template_id
WHERE measure_instances.opinion_poll_id = question_instances.opinion_poll_id
GROUP BY measure_instances.id
)
UPDATE measure_instances
SET total_score=(SELECT avg FROM weighted_score
WHERE weighted_score.id = measure_instances.id)*100
RETURNING total_score;
This seems to not only not group as expected, but produced incorrect results.
Why is the result an integer rather then a float? Why is the result not being grouped by measure instance instead being identical across all?
And why is the result incorrect for any of them?
A demonstration: http://sqlfiddle.com/#!15/dcce8/1
EDIT: In working through explaining exactly what I wanted, I realized the source of my problem was that I was simply adding percentages, rather then normalizing across questions as a percentage.
My new and improved sql is:
WITH per_question_percentage AS (
SELECT SUM(answer_options.weight)/COUNT(question_instances.id) percentage, question_templates.id qid, opinion_poll_instances.id oid
FROM question_instances
INNER JOIN answer_options ON question_instances.answer_option_id = answer_options.id
INNER JOIN question_templates ON question_templates.id = question_instances.template_id
INNER JOIN opinion_poll_instances ON opinion_poll_instances.id = question_instances.opinion_poll_id
GROUP BY question_templates.id, opinion_poll_instances.id
), max_per_measure AS (
SELECT SUM(scores.score), measure_instances.id mid, measure_instances.opinion_poll_id oid
FROM measure_instances
INNER JOIN scores ON scores.measure_template_id=measure_instances.template_id
GROUP BY measure_instances.id, measure_instances.opinion_poll_id
), per_measure_per_opinion_poll AS (
SELECT per_question_percentage.percentage * scores.score score, measure_instances.id mid, measure_instances.opinion_poll_id oid
FROM question_instances
INNER JOIN scores ON question_instances.template_id = scores.question_template_id
INNER JOIN measure_instances ON measure_instances.template_id = scores.measure_template_id
INNER JOIN max_per_measure ON measure_instances.id = max_per_measure.mid
INNER JOIN per_question_percentage ON per_question_percentage.qid = question_instances.template_id
WHERE measure_instances.opinion_poll_id = question_instances.opinion_poll_id AND question_instances.opinion_poll_id = per_question_percentage.oid
GROUP BY measure_instances.id, measure_instances.opinion_poll_id, per_question_percentage.percentage, scores.score
)
UPDATE measure_instances
SET total_score = subquery.result*100
FROM (SELECT SUM(per_measure_per_opinion_poll.score)/max_per_measure.sum result, per_measure_per_opinion_poll.mid, per_measure_per_opinion_poll.oid
FROM max_per_measure, per_measure_per_opinion_poll
WHERE per_measure_per_opinion_poll.mid = max_per_measure.mid
AND per_measure_per_opinion_poll.oid = max_per_measure.oid
GROUP BY max_per_measure.sum, per_measure_per_opinion_poll.mid, per_measure_per_opinion_poll.oid)
AS subquery(result, mid, oid)
WHERE measure_instances.id = subquery.mid
AND measure_instances.opinion_poll_id = subquery.oid
RETURNING total_score;
Is this canonical sql? Is there anything I should be aware of with this kind of CTE chaining (or otherwise)? Is there a more efficient way to achieve the same thing?
This is a bit long for a comment.
I don't understand the questions.
Why is the result an integer rather then a float?
Because measure_instances.total_score is an integer and that is what the returning clause is returning.
Why is the result not being grouped by measure instance instead being identical across all?
When I run the CTE independently, the values are 0.45. The data and logic dictate the same values.
And why is the result incorrect for any of them?
I think you mean "for all of them". In any case, the results look correct to me.
If you run this query against data in your demo:
SELECT
answer_options.weight, measure_instances.id
FROM
question_instances
INNER JOIN
answer_options ON question_instances.template_id = answer_options.question_template_id
INNER JOIN
scores ON question_instances.template_id = scores.question_template_id
INNER JOIN
measure_instances ON measure_instances.template_id=scores.measure_template_id
WHERE
measure_instances.opinion_poll_id = question_instances.opinion_poll_id
ORDER BY
2;
You will get:
| weight | id |
|--------|----|
| 0.5 | 1 |
| 0.25 | 1 |
| 0.25 | 1 |
| 0.75 | 1 |
| 0.5 | 1 |
| 0.75 | 2 |
| 0.5 | 2 |
| 0.25 | 2 |
| 0.5 | 2 |
| 0.25 | 2 |
If you calculate averages by hand, you will get:
For id=1 ==> 0.5+0.25+0.25+0.75 + 0.5 = 2.25 ==> 2.25 / 5 = 0.45
For id=2 ==> 0.75 + 0.5 + 0.25 + 0.5 + 0.25 = 2.25 ==> 2.25 / 5 = 0.45
It seems to me, that this query is working perfectly.
Please explain why these results are wrong to you, and what do you expect to get from the above data and query?

Selecting a number of related records into a result row

I am currently writing an export function for an MS-Access database and i am not quite sure how to write a query that gives me the results that i want.
What i am trying to do is the following:
Let's say i have a table Error and there is a many-to-many relationship to the table Cause, modeled by the table ErrorCause. Currently i have a query similar to this (simplified, the original also goes one relationship further):
select Error.ID, Cause.ID
from ((Error inner join ErrorCauses on Error.ID = ErrorCauses.Error)
left join Cause on ErrorCauses.Cause = Cause.ID)
I get something like this:
Error | Cause
-------------
12345 | 12
12345 | 23
67890 | 23
67890 | 34
But i need to select the IDs of the first, say, 3 Causes for each error (even if those are empty), so that it looks like this:
Error | Cause1 | Cause2 | Cause3
--------------------------------
12345 | 12 | 23 |
67890 | 23 | 34 |
Is there any way to do this in a single query?
Like selecting the Top 3 and then flattening this into the resulting row?
Thanks in advance for any pointers.
Your requirement is for a specific number of causes--3. This makes it possible and manageable to get three different causes on the same row by doing a three-way join on the same subquery.
First, let's define your error-and-cause query as a straight-up Access query (a QueryDef object, if you want to be technical).
qryErrorCauseInfo:
select
Error.ID as ErrorID
, Cause.ID as CauseID
from (Error
inner join ErrorCauses
on Error.ID = ErrorCauses.Error)
left outer join Cause
on ErrorCauses.Cause = Cause.ID
By the way, I feel that the above left join should really be an inner join, for the reason I mentioned in my comment.
Next, let's do a three-way join to get possible combinations of causes in rows:
qryTotalCause:
select distinct
*
, iif(Cause1 is null, 0, 1)
+ iif(Cause2 is null, 0, 1)
+ iif(Cause3 is null, 0, 1) as TotalCause
from (
select
eci1.ErrorID
, eci1.CauseID as Cause1
, iif(eci2.CauseID = Cause1, null, eci2.CauseID) as Cause2
, iif(
eci3.CauseID = Cause1 or eci3.CauseID = Cause2
, null
, eci3.CauseID
) as Cause3
from (qryErrorCauseInfo as eci1
left outer join qryErrorCauseInfo as eci2
on eci1.ErrorID = eci2.ErrorID)
left outer join qryErrorCauseInfo as eci3
on eci2.ErrorID = eci3.ErrorID
) as sq
where (
Cause1 < Cause2
and Cause2 < Cause3
) or (
Cause1 < Cause2
and Cause3 is null
) or (
Cause2 is null
and Cause3 is null
) or (
Cause1 is null
and Cause2 is null
and Cause3 is null
)
Finally, we need a correlated subquery to select, for each error, the one row with the highest number of causes (the rest of the rows are simply different permutations of the same causes):
select
ErrorID
, Cause1
, Cause2
, Cause3
from qryTotalCause as tc1
where tc1.TotalCause = (
select max(tc2.TotalCause)
from qryTotalCause as tc2
where tc1.ErrorID = tc2.ErrorID
)
Simple! (Not :-)

How can I find tables which reference a particular row via a foreign key?

Given a structure like this:
CREATE TABLE reference_table (
reference_table_key numeric NOT NULL,
reference_value numeric,
CONSTRAINT reference_table_pk PRIMARY KEY (reference_table_key)
);
CREATE TABLE other_table (
other_table_key numeric NOT NULL,
reference_table_key numeric,
CONSTRAINT other_table_pk PRIMARY KEY (other_table_key),
ONSTRAINT other_table_reference_fk FOREIGN KEY (reference_table_key)
REFERENCES reference_table (reference_table_key) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE SET NULL
);
CREATE TABLE another_table (
another_table_key numeric NOT NULL,
do_stuff_key numeric,
CONSTRAINT another_table_pk PRIMARY KEY (another_table_key),
ONSTRAINT another_table_reference_fk FOREIGN KEY (do_stuff_key)
REFERENCES reference_table (reference_table_key) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE SET NULL
);
--there are 50-60 tables which have similar foreign key references to reference_table
I want to write a query that tells me the primary keys in other_table and another_table and potentially more tables where reference_value is NULL.
In psuedo-code:
SELECT table_name, table_primary_key, table_fk_column_name
FROM ?????? some PG table ???????, reference_table
WHERE reference_table.reference_value IS NULL;
The result would look something like:
table_name | table_primary_key | table_fk_column_name | reference_table_pk
---------------------------------------------------------------------------
other_table | 2 | reference_table_key | 7
other_table | 4 | reference_table_key | 56
other_table | 45 | reference_table_key | 454
other_table | 65765 | reference_table_key | 987987
other_table | 11 | reference_table_key | 3213
another_table | 3 | do_stuff_key | 4645
another_table | 5 | do_stuff_key | 43546
another_table | 7 | do_stuff_key | 464356
unknown_table | 1 | unkown_column_key | 435435
unknown_table | 1 | some_other_column_key | 34543
unknown_table | 3 | unkown_column_key | 124
unknown_table | 3 | some_other_column_key | 123
This is similar to, but not a duplicate of Postgres: SQL to list table foreign keys . That question shows the table structure. I want to find specific instances.
Essentially if I were to DELETE FROM reference_table WHERE reference_value IS NULL;, postgres has to do something internally to figure out that it needs to set reference_table_key in row 2 in other_table to NULL. I want to see what those rows would be.
Is there a query that can do this? Is there a modifier that I can pass to a DELETE call that would tell me what tables/rows/columns would be affected by that DELETE?
NULL values in referencing columns
This query produces the DML statement to find all rows in all tables, where a column has a foreign-key constraint referencing another table but hold a NULL value in that column:
WITH x AS (
SELECT c.conrelid::regclass AS tbl
, c.confrelid::regclass AS ftbl
, quote_ident(k.attname) AS fk
, quote_ident(pf.attname) AS pk
FROM pg_constraint c
JOIN pg_attribute k ON (k.attrelid, k.attnum) = (c.conrelid, c.conkey[1])
JOIN pg_attribute f ON (f.attrelid, f.attnum) = (c.confrelid, c.confkey[1])
LEFT JOIN pg_constraint p ON p.conrelid = c.conrelid AND p.contype = 'p'
LEFT JOIN pg_attribute pf ON (pf.attrelid, pf.attnum)
= (p.conrelid, p.conkey[1])
WHERE c.contype = 'f'
AND c.confrelid = 'fk_tbl'::regclass -- references to this tbl
AND f.attname = 'fk_tbl_id' -- and only to this column
)
SELECT string_agg(format(
'SELECT %L AS tbl
, %L AS pk
, %s::text AS pk_val
, %L AS fk
, %L AS ftbl
FROM %1$s WHERE %4$s IS NULL'
, tbl
, COALESCE(pk 'NONE')
, COALESCE(pk 'NULL')
, fk
, ftbl), '
UNION ALL
') || ';'
FROM x;
Produces a query like this:
SELECT 'some_tbl' AS tbl
, 'some_tbl_id' AS pk
, some_tbl_id::text AS pk_val
, 'fk_tbl_id' AS fk
, 'fk_tbl' AS ftbl
FROM some_tbl WHERE fk_tbl_id IS NULL
UNION ALL
SELECT 'other_tbl' AS tbl
, 'other_tbl_id' AS pk
, other_tbl_id::text AS pk_val
, 'some_name_id' AS fk
, 'fk_tbl' AS ftbl
FROM other_tbl WHERE some_name_id IS NULL;
Produces output like this:
tbl | pk | pk_val | fk | ftbl
-----------+--------------+--------+--------------+--------
some_tbl | some_tbl_id | 49 | fk_tbl_id | fk_tbl
some_tbl | some_tbl_id | 58 | fk_tbl_id | fk_tbl
other_tbl | other_tbl_id | 66 | some_name_id | fk_tbl
other_tbl | other_tbl_id | 67 | some_name_id | fk_tbl
Does not cover multi-column foreign or primary keys reliably. You have to make the query more complex for this.
I cast all primary key values to text to cover all types.
Adapt or remove these lines to find foreign key pointing to an other or any column / table:
AND c.confrelid = 'fk_tbl'::regclass
AND f.attname = 'fk_tbl_id' -- and only this column
Tested with PostgreSQL 9.1.4. I use the pg_catalog tables. Realistically nothing of what I use here is going to change, but that is not guaranteed across major releases. Rewrite it with tables from information_schema if you need it to work reliably across updates. That is slower, but sure.
I did not sanitize table names in the generated DML script, because quote_ident() would fail with schema-qualified names. It is your responsibility to avoid harmful table names like "users; DELETE * FROM users;". With some more effort, you can retrieve schema-name and table name separately and use quote_ident().
NULL values in referenced columns
My first solution does something subtly different from what you ask, because what you describe (as I understand it) is non-existent. The value NULL is "unknown" and cannot be referenced. If you actually want to find rows with a NULL value in a column that has FK constraints pointing to it (not to the particular row with the NULL value, of course), then the query can be much simplified:
WITH x AS (
SELECT c.confrelid::regclass AS ftbl
,quote_ident(f.attname) AS fk
,quote_ident(pf.attname) AS pk
,string_agg(c.conrelid::regclass::text, ', ') AS referencing_tbls
FROM pg_constraint c
JOIN pg_attribute f ON (f.attrelid, f.attnum) = (c.confrelid, c.confkey[1])
LEFT JOIN pg_constraint p ON p.conrelid = c.confrelid AND p.contype = 'p'
LEFT JOIN pg_attribute pf ON (pf.attrelid, pf.attnum)
= (p.conrelid, p.conkey[1])
WHERE c.contype = 'f'
-- AND c.confrelid = 'fk_tbl'::regclass -- only referring this tbl
GROUP BY 1, 2, 3
)
SELECT string_agg(format(
'SELECT %L AS ftbl
, %L AS pk
, %s::text AS pk_val
, %L AS fk
, %L AS referencing_tbls
FROM %1$s WHERE %4$s IS NULL'
, ftbl
, COALESCE(pk, 'NONE')
, COALESCE(pk, 'NULL')
, fk
, referencing_tbls), '
UNION ALL
') || ';'
FROM x;
Finds all such rows in the entire database (commented out the restriction to one table). Tested with Postgres 9.1.4 and works for me.
I group multiple tables referencing the same foreign column into one query and add a list of referencing tables to give an overview.
You want a union for this query:
select *
from ((select 'other_table' as table_name,
other_table_key as primary_key,
'reference_table_key' as table_fk,
ot.reference_table_key
from other_table ot left outer join
reference_table rt
on ot.reference_table_key = rt.reference_table_key
where rt.reference_value is null
) union all
(select 'another_table' as table_name,
another_table_key as primary_key,
'do_stuff_key' as table_fk,
at.do_stuff_key
from another_table at left outer join
reference_table rt
on at.do_stuff_key = rt.reference_table_key
where rt.reference_value is null
)
) t