SQL Case statement with Count? - sql

I have a database that allows for more than one ethnicity per person. Unfortunately, our answers are essentially Yes Hispanic, Not Hispanic, and Unknown, and there are some who do indeed have multiple selections. I need to run a large query that pulls lots of info, one of which is ethnicity, and I want to "convert" those that have multiple selections as Unknown.
person_ethnicity_xref table:
Person_ID
Ethnicity_ID
1234567
SLWOWQ
1234567
ZLKJDU
mstr_lists table:
Ethnicity_ID
Ethnicity
SLWOWQ
Hispanic
ZLKJDU
Not Hispanic
I've been struggling with this as I can't use a For XML Path with two tables, so I'm now trying to use the logic of
Case
When count(ethnicity_ID)>1 then 'Unknown'
Else Ethnicity
End
Here's what I have
select
p.person_nbr,
case
when count(eth1.ethnicity_item_id)>1 then 'Unknown'
else ml1.mstr_list_item_desc
end 'final eth'
from
person_table p
left join person_ethnicity_xref eth1 on p.person_id=eth1.person_id
left join mstr_lists ml1 on eth1.ethnicity_item_id=ml1.mstr_list_item_id
group by
p.person_nbr,
ml1.mstr_list_item_desc
This gave me results but when I check them, those with >1 don't have a value of Unknown and people are listed twice once with each ethnicity.
Another part of this larger query has a subquery in the FROM that counts race and a separate table join for only those with a count=1. Then the case says if the subquery that counts race came up with >1 then X otherwise use that other table for count=1. Because the race table also uses that mstr_list there's then 5 tables involved (there's a second person_id join now that I look at it more closely, and there's a mstr_list to the count and the regular tables...I have no idea why, my brain is tired and that count table isn't a simple count and also is doing something else). Is this really the only option? This query already takes over 10 min to run (it is not run on production!) and I'd hate to make it worse by duplicating what the previous writer did.

Use aggregation:
select p.person_nbr,
(case when min(ml1.mstr_list_item_desc) = max(ml1.mstr_list_item_desc)
then min(ml1.mstr_list_item_desc)
else 'Unknown'
end) as final_ethnicity
from person_table p left join
person_ethnicity_xref eth1
on p.person_id = eth1.person_id left join
mstr_lists ml1
on eth1.ethnicity_item_id = ml1.mstr_list_item_id
group by p.person_nbr;
Note: This slightly tweaks your logic. If there a multiple ethnicities and they are all the same, then that value is used.

I think if you remove ml1.mstr_list_item_desc from group by clause your query would work. And use min(ml1.mstr_list_item_desc) instead of ml1.mstr_list_item_desc.
Below query would work.
select
p.person_nbr,
case
when count(eth1.ethnicity_item_id)>1 then 'Unknown'
else min(ml1.mstr_list_item_desc)
end 'final eth'
from
person_table p
left join person_ethnicity_xref eth1 on p.person_id=eth1.person_id
left join mstr_lists ml1 on eth1.ethnicity_item_id=ml1.mstr_list_item_id
group by
p.person_nbr
DB-FIDDLE Example:
Schema and insert statements:
create table person_table(person_id int,person_nbr varchar(10));
insert into person_table values(1234567,'abc');
insert into person_table values(1234568,'xyz');
create table person_ethnicity_xref (Person_ID int, ethnicity_item_id varchar(50));
insert into person_ethnicity_xref values(1234567, 'SLWOWQ');
insert into person_ethnicity_xref values(1234567, 'ZLKJDU');
insert into person_ethnicity_xref values(1234568, 'ZLKJDU');
create table mstr_lists(mstr_list_item_id varchar(50), mstr_list_item_desc varchar(50));
insert into mstr_lists values('SLWOWQ','Hispanic');
insert into mstr_lists values('ZLKJDU','Not Hispanic');
Query:
select
p.person_nbr,
case
when count(eth1.ethnicity_item_id)1 then 'Unknown'
else min(ml1.mstr_list_item_desc)
end 'final eth'
from
person_table p
left join person_ethnicity_xref eth1 on p.person_id=eth1.person_id
left join mstr_lists ml1 on eth1.ethnicity_item_id=ml1.mstr_list_item_id
group by
p.person_nbr
Output:
person_nbr
final eth
abc
Unknown
xyz
Not Hispanic
db<fiddle here

Related

Efficient/optimized query for my query using multiple UNIONS with JOIN

Can someone please have a look into query and suggest any improvement or optimized query for the same so that query runs faster .
So basically, I have 2 table Survey and SurveyInvite.
Sample data for Table Survey
CREATE TABLE dbo.Survey
(
createdate date,
emailinvite char(4),
phoneinvite char(4),
smsinvite char(4),
surveyid int
);
INSERT dbo.Survey VALUES
('20220201','12ab','12bc', null ,1),
('20220210','23be','45hg','45tr',2),
('20220220','65hg', null ,'89kj',3);
Sample data for Table SurveyInvite
CREATE TABLE dbo.SurveyInvite
(
sentdate date,
id char(4)
);
INSERT dbo.SurveyInvite VALUES
('20220201','12ab'),
('20220205','12bc'),
('20220210','23be'),
('20220214','45hg'),
('20220218','45tr'),
('20220220','65hg'),
('20220224','89kj');
The output should be
Type
sentdate
inviteid
surveyid
Email
2022-02-01
12ab
1
Email
2022-02-10
23be
2
Email
2022-02-20
65hg
3
Phone
2022-02-05
12bc
1
Phone
2022-02-14
45hg
2
SMS
2022-02-18
45tr
2
SMS
2022-02-24
89kj
3
So basically, I have to get sentdate from SurveyInvite table against each type(email,phone,sms).
Survey table should be unpivoted on email,phone and sms to transform column into rows.
Here's my query
SELECT 'Email' as Type,esi.sentdate,emailinvite as inviteid,s.surveyid
FROM Survey s
INNER JOIN SurveyInvite esi on s.emailinvite=esi.id
UNION
SELECT 'SMS' as Type,ssi.sentdate,smsinvite as inviteid,s.surveyid
FROM Survey s
INNER JOIN SurveyInvite ssi on s.smsinvite=ssi.id
UNION
SELECT 'Phone' as Type,psi.sentdate,phoneinvite as inviteid,s.surveyid
FROM Survey s
INNER JOIN SurveyInvite psi on s.phoneinvite=psi.id
Please suggest other way to write query if that makes query faster. I am still trying using UNPIVOT,left join,CTE to avoid using UNION.
Sample setup here
You don't need to query the tables three times, you can just unpivot. The easiest way to do this is with a CROSS APPLY (VALUES
SELECT
v.Type,
ssi.sentdate,
v.inviteid,
s.surveyid
FROM Survey s
CROSS APPLY (VALUES
('Email', s.emailinvite),
('Phone', s.phoneinvite),
('SMS', s.smsinvite)
) v (Type, inviteid)
INNER JOIN SurveyInvite ssi on v.inviteid = ssi.id;
I suggest you consider normalizing your database in the first place by storing the data unpivoted in a separate table.
Another way (again the key is to only read either table once instead of three times):
SELECT i.sentdate,
[Type] = REPLACE(u.Types, 'invite', ''),
inviteid = u.id,
u.surveyid
FROM dbo.Survey AS s
UNPIVOT (Id FOR Types IN
(emailinvite, phoneinvite, smsinvite)) AS u
INNER JOIN dbo.SurveyInvite AS i ON u.Id = i.id;
As you can see from the db<>fiddle, this eliminates 4 of the 6 table scans and also an expensive distinct sort.
I assume that you have set the primary and foreign keys correctly. It might also beneficial to have indexes on the foreign keys. See: Should every SQL Server foreign key have a matching index?.
As always with these performance questions. Only benchmarking different variants can tell you which one is the fastest. The same query can perform very differently with a different set of data.
One possibility is to use joins and base the query on SurveyInvite:
SELECT
I.sentdate,
CASE WHEN SE.id IS NOT NULL THEN 'Email'
WHEN SP.id IS NOT NULL THEN 'Phone'
ELSE 'SMS'
END AS Type,
I.id AS inviteid,
CASE WHEN SE.id IS NOT NULL THEN SE.surveyid
WHEN SP.id IS NOT NULL THEN SP.surveyid
ELSE SS.surveyid
END AS surveyid
FROM
SurveyInvite I
LEFT JOIN Survey SE
ON I.emailinvite = SE.id
LEFT JOIN Survey SP
ON I.phoneinvite = SP.id
LEFT JOIN Survey SS
ON I.smsinvite = SS.id

What is the best way to join tables

this is more like a general question.
I am looking for the best way to join 4, maybe 5 different tables. I am trying to create a Power Bi pulling live information from an IBM AS400 where customer service can type one of our parts number,
see how many parts we have in inventory, if none, see the lead time and if there are any orders already already entered for the typed part number.
SERI is our inventory table with 37180 records.
(active inventory that is available)
METHDM is our kit table with 37459 records.
(this table contains the bill of materials for custom kits, KIT A123 contains different part numbers in it witch are in SERI as well.)
STKA is our part lead time table with 76796 records.
(lead time means how long will it take for parts to come in)
OCRI is our sales order table with 6497 records.
(This table contains all customer orders)
I have some knowledge in writing queries but this one is more challenging of what I have created in the past. Should I start with the table that has the most records and start left joining the rest ?
From STKA 76796 records
Left join METHDM 37459 records on STKA
left join SERI 37180 records on STKA
left join OCRI 6497 records on STAK
Select
STKA.v6part as part,
STKA.v6plnt as plant,
STKA.v6tdys as pur_leadtime,
STKA.v6prpt as Pur_PrepLeadtime,
STKA.v6lead as Mfg_leadtime,
STKA.v6prpt as Mfg_PrepLeadTime,
METHDM.AQMTLP AS COMPONENT,
METHDM.AQQPPC AS QTYNEEDED,
SERI.HTLOTN AS BATCH,
SERI.HTUNIT AS UOM,
(HTQTY - HTQTYC) as ONHAND,
OCRI.DDORD# AS SALESORDER,
OCRI.DDRDAT AS PROMISED
from stka
left join METHDM on STKA.V6PART = METHDM.AQPART
left join SERI on STKA.V6PART = SERI.HTPART
left join OCRI on STKA.V6PART = OCRI.DDPART
Is this the best way to join the tables?
I think you already have your answer, but conceptually, there are a few issues here to deal with, and I figured I would give you a few examples, using data a little bit like yours, but massively simplified.
CREATE TABLE #STKA (V6PART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #METHDM (AQPART INT, KIT_ID INT, SOME_DATE DATETIME, OTHER_DATA VARCHAR(50));
CREATE TABLE #SERI (HTPART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #OCRI (DDPART INT, OTHER_DATA VARCHAR(50));
INSERT INTO #STKA SELECT 1, NULL UNION ALL SELECT 2, NULL UNION ALL SELECT 3, NULL; --1, 2, 3 Ids
INSERT INTO #METHDM SELECT 1, 1, '20200108 10:00', NULL UNION ALL SELECT 1, 2, '20200108 11:00', NULL UNION ALL SELECT 2, 1, '20200108 13:00', NULL; --1 Id appears twice, 2 Id once, no 3 Id
INSERT INTO #SERI SELECT 1, NULL UNION ALL SELECT 3, NULL; --1 and 3 Ids
INSERT INTO #OCRI SELECT 1, NULL UNION ALL SELECT 4, NULL; --1 and 4 Ids
So fundamentally we have a few issues here:
o the first problem is that the IDs in the tables differ, one table has an ID #4 but this isn't in any of the others;
o the second issue is that we have multiple rows for the same ID in one table;
o the third issue is that some tables are "missing" IDs that are in other tables, which you already covered by using LEFT JOINs, so I will ignore this.
--This will select ID 1 twice, 2 once, 3 once, and miss 4 completely
SELECT
*
FROM
#STKA
LEFT JOIN #METHDM ON #METHDM.AQPART = #STKA.V6PART
LEFT JOIN #SERI ON #SERI.HTPART = #STKA.V6PART
LEFT JOIN #OCRI ON #OCRI.DDPART = #STKA.V6PART;
So the problem here is that we don't have every ID in our "anchor" table STKA, and in fact there's no single table that has every ID in it. Now your data might be fine here, but if it isn't then you can simply add a step to find every ID, and use this as the anchor.
--This will select each ID, but still doubles up on ID 1
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN #METHDM ON #METHDM.AQPART = I.Id
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
That's using a common-table expression, but a subquery would also do the job. However, this still leaves us with an issue where ID 1 appears twice in the list, because it has multiple rows in one of the sub-tables.
One way to fix this is to pick the row with the latest date, or any other ORDER you can apply to the data:
--Pick the best row for the table where it has multiple rows, now we get one row per ID
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI),
BestMETHDM AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY AQPART ORDER BY SOME_DATE DESC) AS ORDER_ID
FROM
#METHDM)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN BestMETHDM ON BestMETHDM.AQPART = I.Id AND BestMETHDM.ORDER_ID = 1
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
Of course you could also add some aggregation (SUM, MAX, MIN, AVG, etc.) to fix this problem (if it is indeed an issue). Also, I used a common-table expression, but this would work just as well with a subquery.
Expanding on a comment made on the question..
I would say I will start with SERI as that table contains the entire inventory for our facility and should cover the other tables
However the question said
SERI is our inventory table with 37180 records. (active inventory that is available)
In my experience, active inventory, isn't the same as all parts.
Normally, in a query like this, I'd expect the first table to be a Parts Master table of some sort that contains every possible part ID.

SELECT Statement in CASE

Please don't downgrade this as it is bit complex for me to explain. I'm working on data migration so some of the structures look weird because it was designed by someone like that.
For ex, I have a table Person with PersonID and PersonName as columns. I have duplicates in the table.
I have Details table where I have PersonName stored in a column. This PersonName may or may not exist in the Person table. I need to retrieve PersonID from the matching records otherwise put some hardcode value in PersonID.
I can't write below query because PersonName is duplicated in Person Table, this join doubles the rows if there is a matching record due to join.
SELECT d.Fields, PersonID
FROM Details d
JOIN Person p ON d.PersonName = p.PersonName
The below query works but I don't know how to replace "NULL" with some value I want in place of NULL
SELECT d.Fields, (SELECT TOP 1 PersonID FROM Person where PersonName = d.PersonName )
FROM Details d
So, there are some PersonNames in the Details table which are not existent in Person table. How do I write CASE WHEN in this case?
I tried below but it didn't work
SELECT d.Fields,
CASE WHEN (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) = null
THEN 123
ELSE (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) END Name
FROM Details d
This query is still showing the same output as 2nd query. Please advise me on this. Let me know, if I'm unclear anywhere. Thanks
well.. I figured I can put ISNULL on top of SELECT to make it work.
SELECT d.Fields,
ISNULL(SELECT TOP 1 p.PersonID
FROM Person p where p.PersonName = d.PersonName, 124) id
FROM Details d
A simple left outer join to pull back all persons with an optional match on the details table should work with a case statement to get your desired result.
SELECT
*
FROM
(
SELECT
Instance=ROW_NUMBER() OVER (PARTITION BY PersonName),
PersonID=CASE WHEN d.PersonName IS NULL THEN 'XXXX' ELSE p.PersonID END,
d.Fields
FROM
Person p
LEFT OUTER JOIN Details d on d.PersonName=p.PersonName
)AS X
WHERE
Instance=1
Ooh goody, a chance to use two LEFT JOINs. The first will list the IDs where they exist, and insert a default otherwise; the second will eliminate the duplicates.
SELECT d.Fields, ISNULL(p1.PersonID, 123)
FROM Details d
LEFT JOIN Person p1 ON d.PersonName = p1.PersonName
LEFT JOIN Person p2 ON p2.PersonName = p1.PersonName
AND p2.PersonID < p1.PersonID
WHERE p2.PersonID IS NULL
You could use common table expressions to build up the missing datasets, i.e. your complete Person table, then join that to your Detail table as follows;
declare #n int;
-- set your default PersonID here;
set #n = 123;
-- Make sure previous SQL statement is terminated with semilcolon for with clause to parse successfully.
-- First build our unique list of names from table Detail.
with cteUniqueDetailPerson
(
[PersonName]
)
as
(
select distinct [PersonName]
from [Details]
)
-- Second get unique Person entries and record the most recent PersonID value as the active Person.
, cteUniquePersonPerson
(
[PersonID]
, [PersonName]
)
as
(
select
max([PersonID]) -- if you wanted the original Person record instead of the last, change this to min.
, [PersonName]
from [Person]
group by [PersonName]
)
-- Third join unique datasets to get the PersonID when there is a match, otherwise use our default id #n.
-- NB, this would also include records when a Person exists with no Detail rows (they are filtered out with the final inner join)
, cteSudoPerson
(
[PersonID]
, [PersonName]
)
as
(
select
coalesce(upp.[PersonID],#n) as [PersonID]
coalesce(upp.[PersonName],udp.[PersonName]) as [PersonName]
from cteUniquePersonPerson upp
full outer join cteUniqueDetailPerson udp
on udp.[PersonName] = p.[PersonName]
)
-- Fourth, join detail to the sudo person table that includes either the original ID or our default ID.
select
d.[Fields]
, sp.[PersonID]
from [Details] d
inner join cteSudoPerson sp
on sp.[PersonName] = d.[PersonName];

Postgres LEFT JOIN with SUM, missing records

I am trying to get the count of certain types of records in a related table. I am using a left join.
So I have a query that isn't quite right and one that is returning the correct results. The correct results query has a higher execution cost. Id like to use the first approach, if I can correct the results. (see http://sqlfiddle.com/#!15/7c20b/5/2)
CREATE TABLE people(
id SERIAL,
name varchar not null
);
CREATE TABLE pets(
id SERIAL,
name varchar not null,
kind varchar not null,
alive boolean not null default false,
person_id integer not null
);
INSERT INTO people(name) VALUES
('Chad'),
('Buck'); --can't keep pets alive
INSERT INTO pets(name, alive, kind, person_id) VALUES
('doggio', true, 'dog', 1),
('dog master flash', true, 'dog', 1),
('catio', true, 'cat', 1),
('lucky', false, 'cat', 2);
My goal is to get a table back with ALL of the people and the counts of the KINDS of pets they have alive:
| ID | ALIVE_DOGS_COUNT | ALIVE_CATS_COUNT |
|----|------------------|------------------|
| 1 | 2 | 1 |
| 2 | 0 | 0 |
I made the example more trivial. In our production app (not really pets) there would be about 100,000 dead dogs and cats per person. Pretty screwed up I know, but this example is simpler to relay ;) I was hoping to filter all the 'dead' stuff out before the count. I have the slower query in production now (from sqlfiddle above), but would love to get the LEFT JOIN version working.
Typically fastest if you fetch all or most rows:
SELECT pp.id
, COALESCE(pt.a_dog_ct, 0) AS alive_dogs_count
, COALESCE(pt.a_cat_ct, 0) AS alive_cats_count
FROM people pp
LEFT JOIN (
SELECT person_id
, count(kind = 'dog' OR NULL) AS a_dog_ct
, count(kind = 'cat' OR NULL) AS a_cat_ct
FROM pets
WHERE alive
GROUP BY 1
) pt ON pt.person_id = pp.id;
Indexes are irrelevant here, full table scans will be fastest. Except if alive pets are a rare case, then a partial index should help. Like:
CREATE INDEX pets_alive_idx ON pets (person_id, kind) WHERE alive;
I included all columns needed for the query (person_id, kind) to allow index-only scans.
SQL Fiddle.
Typically fastest for a small subset or a single row:
SELECT pp.id
, count(kind = 'dog' OR NULL) AS alive_dogs_count
, count(kind = 'cat' OR NULL) AS alive_cats_count
FROM people pp
LEFT JOIN pets pt ON pt.person_id = pp.id
AND pt.alive
WHERE <some condition to retrieve a small subset>
GROUP BY 1;
You should at least have an index on pets.person_id for this (or the partial index from above) - and possibly more, depending ion the WHERE condition.
Related answers:
Query with LEFT JOIN not returning rows for count of 0
GROUP or DISTINCT after JOIN returns duplicates
Get count of foreign key from multiple tables
Your WHERE alive=true is actually filtering out record for person_id = 2. Use the below query, push the WHERE alive=true condition into the CASE condition as can be noticed here. See your modified Fiddle
SELECT people.id,
pe.alive_dogs_count,
pe.alive_cats_count
FROM people
LEFT JOIN
(
select person_id,
COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
from pets
GROUP BY person_id
) pe on people.id = pe.person_id
(OR) your version
SELECT
people.id,
COALESCE(SUM(case when pets.kind='dog' and alive = true then 1 else 0 end),0) as alive_dogs_count,
COALESCE(SUM(case when pets.kind='cat' and alive = true then 1 else 0 end),0) as alive_cats_count
FROM people
LEFT JOIN pets on people.id = pets.person_id
GROUP BY people.id;
JOIN with SUM
I think your original query was something like this:
SELECT people.id, stats.dog, stats.cat
FROM people
JOIN (SELECT person_id, count(kind)filter(where kind='dog') dog, count(kind)filter(where kind='cat') cat FROM pets WHERE alive GROUP BY person_id) stats
ON stats.person_id = people.id
That works smoothly, but you should understand, that the result will miss the people with 0 pets, because of inner join.
In order to include people who miss pets, you can:
firstly LEFT JOIN,
then GROUP BY joined result
and be ready for NULL values instead of counts.
See the accepted answer above.
Credits to #ErwinBrandstetter
Slowness
In contrast to other DBMS', Postgresql doesn't create indexes for foreign keys.
One multicolumn index will be more efficient than three single indexes. Extend the foreign key index with extra columns from WHERE and JOIN ON columns in the right order:
CREATE INDEX people_fk_with_kind_alive ON test2 (person_id, alive, kind);
REF: https://postgresql.org/docs/11/indexes-multicolumn.html
Of course, your primary keys should be defined. The primary key will be indexed by default.

SQL joins with multiple records into one with a default

My 'people' table has one row per person, and that person has a division (not unique) and a company (not unique).
I need to join people to p_features, c_features, d_features on:
people.person=p_features.num_value
people.division=d_features.num_value
people.company=c_features.num_value
... in a way that if there is a record match in p_features/d_features/c_features only, it would be returned, but if it was in 2 or 3 of the tables, the most specific record would be returned.
From my test data below, for example, query for person=1 would return
'FALSE'
person 3 returns maybe, person 4 returns true, and person 9 returns default
The biggest issue is that there are 100 features and I have queries that need to return all of them in one row. My previous attempt was a function which queried on feature,num_value in each table and did a foreach, but 100 features * 4 tables meant 400 reads and it brought the database to a halt it was so slow when I loaded up a few million rows of data.
create table p_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table c_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table d_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table default_features (
feature varchar(20),
feature_value varchar(128)
);
create table people (
person int8 not null,
division int8 not null,
company int8 not null
);
insert into people values (4,5,6);
insert into people values (3,5,6);
insert into people values (1,2,6);
insert into p_features values (4,'WEARING PANTS','TRUE');
insert into c_features values (6,'WEARING PANTS','FALSE');
insert into d_features values (5,'WEARING PANTS','MAYBE');
insert into default_features values('WEARING PANTS','DEFAULT');
You need to transpose the features into rows with a ranking. Here I used a common-table expression. If your database product does not support them, you can use temporary tables to achieve the same effect.
;With RankedFeatures As
(
Select 1 As FeatureRank, P.person, PF.feature, PF.feature_value
From people As P
Join p_features As PF
On PF.num_value = P.person
Union All
Select 2, P.person, PF.feature, PF.feature_value
From people As P
Join d_features As PF
On PF.num_value = P.division
Union All
Select 3, P.person, PF.feature, PF.feature_value
From people As P
Join c_features As PF
On PF.num_value = P.company
Union All
Select 4, P.person, DF.feature, DF.feature_value
From people As P
Cross Join default_features As DF
)
, HighestRankedFeature As
(
Select Min(FeatureRank) As FeatureRank, person
From RankedFeatures
Group By person
)
Select RF.person, RF.FeatureRank, RF.feature, RF.feature_value
From people As P
Join HighestRankedFeature As HRF
On HRF.person = P.person
Join RankedFeatures As RF
On RF.FeatureRank = HRF.FeatureRank
And RF.person = P.person
Order By P.person
I don't know if I had understood very well your question, but to use JOIN, you need your table loaded already and then use the SELECT statement with INNER JOIN, LEFT JOIN or whatever you need to show.
If you post some more information, maybe turn it easy to understand.
There are some aspects of your schema I'm not understanding, like how to relate to the default_features table if there's no match in any of the specific tables. The only possible join condition is on feature, but if there's no match in the other 3 tables, there's no value to join on. So, in my example, I've hard-coded the DEFAULT since I can't think of how else to get it.
Hopefully this can get you started and if you can clarify the model a bit more, the solution can be refined.
select p.person, coalesce(pf.feature_value, df.feature_value, cf.feature_value, 'DEFAULT')
from people p
left join p_features pf
on p.person = pf.num_value
left join d_features df
on p.division = df.num_value
left join c_features cf
on p.company = cf.num_value