Efficient/optimized query for my query using multiple UNIONS with JOIN - sql

Can someone please have a look into query and suggest any improvement or optimized query for the same so that query runs faster .
So basically, I have 2 table Survey and SurveyInvite.
Sample data for Table Survey
CREATE TABLE dbo.Survey
(
createdate date,
emailinvite char(4),
phoneinvite char(4),
smsinvite char(4),
surveyid int
);
INSERT dbo.Survey VALUES
('20220201','12ab','12bc', null ,1),
('20220210','23be','45hg','45tr',2),
('20220220','65hg', null ,'89kj',3);
Sample data for Table SurveyInvite
CREATE TABLE dbo.SurveyInvite
(
sentdate date,
id char(4)
);
INSERT dbo.SurveyInvite VALUES
('20220201','12ab'),
('20220205','12bc'),
('20220210','23be'),
('20220214','45hg'),
('20220218','45tr'),
('20220220','65hg'),
('20220224','89kj');
The output should be
Type
sentdate
inviteid
surveyid
Email
2022-02-01
12ab
1
Email
2022-02-10
23be
2
Email
2022-02-20
65hg
3
Phone
2022-02-05
12bc
1
Phone
2022-02-14
45hg
2
SMS
2022-02-18
45tr
2
SMS
2022-02-24
89kj
3
So basically, I have to get sentdate from SurveyInvite table against each type(email,phone,sms).
Survey table should be unpivoted on email,phone and sms to transform column into rows.
Here's my query
SELECT 'Email' as Type,esi.sentdate,emailinvite as inviteid,s.surveyid
FROM Survey s
INNER JOIN SurveyInvite esi on s.emailinvite=esi.id
UNION
SELECT 'SMS' as Type,ssi.sentdate,smsinvite as inviteid,s.surveyid
FROM Survey s
INNER JOIN SurveyInvite ssi on s.smsinvite=ssi.id
UNION
SELECT 'Phone' as Type,psi.sentdate,phoneinvite as inviteid,s.surveyid
FROM Survey s
INNER JOIN SurveyInvite psi on s.phoneinvite=psi.id
Please suggest other way to write query if that makes query faster. I am still trying using UNPIVOT,left join,CTE to avoid using UNION.
Sample setup here

You don't need to query the tables three times, you can just unpivot. The easiest way to do this is with a CROSS APPLY (VALUES
SELECT
v.Type,
ssi.sentdate,
v.inviteid,
s.surveyid
FROM Survey s
CROSS APPLY (VALUES
('Email', s.emailinvite),
('Phone', s.phoneinvite),
('SMS', s.smsinvite)
) v (Type, inviteid)
INNER JOIN SurveyInvite ssi on v.inviteid = ssi.id;
I suggest you consider normalizing your database in the first place by storing the data unpivoted in a separate table.

Another way (again the key is to only read either table once instead of three times):
SELECT i.sentdate,
[Type] = REPLACE(u.Types, 'invite', ''),
inviteid = u.id,
u.surveyid
FROM dbo.Survey AS s
UNPIVOT (Id FOR Types IN
(emailinvite, phoneinvite, smsinvite)) AS u
INNER JOIN dbo.SurveyInvite AS i ON u.Id = i.id;
As you can see from the db<>fiddle, this eliminates 4 of the 6 table scans and also an expensive distinct sort.

I assume that you have set the primary and foreign keys correctly. It might also beneficial to have indexes on the foreign keys. See: Should every SQL Server foreign key have a matching index?.
As always with these performance questions. Only benchmarking different variants can tell you which one is the fastest. The same query can perform very differently with a different set of data.
One possibility is to use joins and base the query on SurveyInvite:
SELECT
I.sentdate,
CASE WHEN SE.id IS NOT NULL THEN 'Email'
WHEN SP.id IS NOT NULL THEN 'Phone'
ELSE 'SMS'
END AS Type,
I.id AS inviteid,
CASE WHEN SE.id IS NOT NULL THEN SE.surveyid
WHEN SP.id IS NOT NULL THEN SP.surveyid
ELSE SS.surveyid
END AS surveyid
FROM
SurveyInvite I
LEFT JOIN Survey SE
ON I.emailinvite = SE.id
LEFT JOIN Survey SP
ON I.phoneinvite = SP.id
LEFT JOIN Survey SS
ON I.smsinvite = SS.id

Related

SQL Case statement with Count?

I have a database that allows for more than one ethnicity per person. Unfortunately, our answers are essentially Yes Hispanic, Not Hispanic, and Unknown, and there are some who do indeed have multiple selections. I need to run a large query that pulls lots of info, one of which is ethnicity, and I want to "convert" those that have multiple selections as Unknown.
person_ethnicity_xref table:
Person_ID
Ethnicity_ID
1234567
SLWOWQ
1234567
ZLKJDU
mstr_lists table:
Ethnicity_ID
Ethnicity
SLWOWQ
Hispanic
ZLKJDU
Not Hispanic
I've been struggling with this as I can't use a For XML Path with two tables, so I'm now trying to use the logic of
Case
When count(ethnicity_ID)>1 then 'Unknown'
Else Ethnicity
End
Here's what I have
select
p.person_nbr,
case
when count(eth1.ethnicity_item_id)>1 then 'Unknown'
else ml1.mstr_list_item_desc
end 'final eth'
from
person_table p
left join person_ethnicity_xref eth1 on p.person_id=eth1.person_id
left join mstr_lists ml1 on eth1.ethnicity_item_id=ml1.mstr_list_item_id
group by
p.person_nbr,
ml1.mstr_list_item_desc
This gave me results but when I check them, those with >1 don't have a value of Unknown and people are listed twice once with each ethnicity.
Another part of this larger query has a subquery in the FROM that counts race and a separate table join for only those with a count=1. Then the case says if the subquery that counts race came up with >1 then X otherwise use that other table for count=1. Because the race table also uses that mstr_list there's then 5 tables involved (there's a second person_id join now that I look at it more closely, and there's a mstr_list to the count and the regular tables...I have no idea why, my brain is tired and that count table isn't a simple count and also is doing something else). Is this really the only option? This query already takes over 10 min to run (it is not run on production!) and I'd hate to make it worse by duplicating what the previous writer did.
Use aggregation:
select p.person_nbr,
(case when min(ml1.mstr_list_item_desc) = max(ml1.mstr_list_item_desc)
then min(ml1.mstr_list_item_desc)
else 'Unknown'
end) as final_ethnicity
from person_table p left join
person_ethnicity_xref eth1
on p.person_id = eth1.person_id left join
mstr_lists ml1
on eth1.ethnicity_item_id = ml1.mstr_list_item_id
group by p.person_nbr;
Note: This slightly tweaks your logic. If there a multiple ethnicities and they are all the same, then that value is used.
I think if you remove ml1.mstr_list_item_desc from group by clause your query would work. And use min(ml1.mstr_list_item_desc) instead of ml1.mstr_list_item_desc.
Below query would work.
select
p.person_nbr,
case
when count(eth1.ethnicity_item_id)>1 then 'Unknown'
else min(ml1.mstr_list_item_desc)
end 'final eth'
from
person_table p
left join person_ethnicity_xref eth1 on p.person_id=eth1.person_id
left join mstr_lists ml1 on eth1.ethnicity_item_id=ml1.mstr_list_item_id
group by
p.person_nbr
DB-FIDDLE Example:
Schema and insert statements:
create table person_table(person_id int,person_nbr varchar(10));
insert into person_table values(1234567,'abc');
insert into person_table values(1234568,'xyz');
create table person_ethnicity_xref (Person_ID int, ethnicity_item_id varchar(50));
insert into person_ethnicity_xref values(1234567, 'SLWOWQ');
insert into person_ethnicity_xref values(1234567, 'ZLKJDU');
insert into person_ethnicity_xref values(1234568, 'ZLKJDU');
create table mstr_lists(mstr_list_item_id varchar(50), mstr_list_item_desc varchar(50));
insert into mstr_lists values('SLWOWQ','Hispanic');
insert into mstr_lists values('ZLKJDU','Not Hispanic');
Query:
select
p.person_nbr,
case
when count(eth1.ethnicity_item_id)1 then 'Unknown'
else min(ml1.mstr_list_item_desc)
end 'final eth'
from
person_table p
left join person_ethnicity_xref eth1 on p.person_id=eth1.person_id
left join mstr_lists ml1 on eth1.ethnicity_item_id=ml1.mstr_list_item_id
group by
p.person_nbr
Output:
person_nbr
final eth
abc
Unknown
xyz
Not Hispanic
db<fiddle here

What is the best way to join tables

this is more like a general question.
I am looking for the best way to join 4, maybe 5 different tables. I am trying to create a Power Bi pulling live information from an IBM AS400 where customer service can type one of our parts number,
see how many parts we have in inventory, if none, see the lead time and if there are any orders already already entered for the typed part number.
SERI is our inventory table with 37180 records.
(active inventory that is available)
METHDM is our kit table with 37459 records.
(this table contains the bill of materials for custom kits, KIT A123 contains different part numbers in it witch are in SERI as well.)
STKA is our part lead time table with 76796 records.
(lead time means how long will it take for parts to come in)
OCRI is our sales order table with 6497 records.
(This table contains all customer orders)
I have some knowledge in writing queries but this one is more challenging of what I have created in the past. Should I start with the table that has the most records and start left joining the rest ?
From STKA 76796 records
Left join METHDM 37459 records on STKA
left join SERI 37180 records on STKA
left join OCRI 6497 records on STAK
Select
STKA.v6part as part,
STKA.v6plnt as plant,
STKA.v6tdys as pur_leadtime,
STKA.v6prpt as Pur_PrepLeadtime,
STKA.v6lead as Mfg_leadtime,
STKA.v6prpt as Mfg_PrepLeadTime,
METHDM.AQMTLP AS COMPONENT,
METHDM.AQQPPC AS QTYNEEDED,
SERI.HTLOTN AS BATCH,
SERI.HTUNIT AS UOM,
(HTQTY - HTQTYC) as ONHAND,
OCRI.DDORD# AS SALESORDER,
OCRI.DDRDAT AS PROMISED
from stka
left join METHDM on STKA.V6PART = METHDM.AQPART
left join SERI on STKA.V6PART = SERI.HTPART
left join OCRI on STKA.V6PART = OCRI.DDPART
Is this the best way to join the tables?
I think you already have your answer, but conceptually, there are a few issues here to deal with, and I figured I would give you a few examples, using data a little bit like yours, but massively simplified.
CREATE TABLE #STKA (V6PART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #METHDM (AQPART INT, KIT_ID INT, SOME_DATE DATETIME, OTHER_DATA VARCHAR(50));
CREATE TABLE #SERI (HTPART INT, OTHER_DATA VARCHAR(50));
CREATE TABLE #OCRI (DDPART INT, OTHER_DATA VARCHAR(50));
INSERT INTO #STKA SELECT 1, NULL UNION ALL SELECT 2, NULL UNION ALL SELECT 3, NULL; --1, 2, 3 Ids
INSERT INTO #METHDM SELECT 1, 1, '20200108 10:00', NULL UNION ALL SELECT 1, 2, '20200108 11:00', NULL UNION ALL SELECT 2, 1, '20200108 13:00', NULL; --1 Id appears twice, 2 Id once, no 3 Id
INSERT INTO #SERI SELECT 1, NULL UNION ALL SELECT 3, NULL; --1 and 3 Ids
INSERT INTO #OCRI SELECT 1, NULL UNION ALL SELECT 4, NULL; --1 and 4 Ids
So fundamentally we have a few issues here:
o the first problem is that the IDs in the tables differ, one table has an ID #4 but this isn't in any of the others;
o the second issue is that we have multiple rows for the same ID in one table;
o the third issue is that some tables are "missing" IDs that are in other tables, which you already covered by using LEFT JOINs, so I will ignore this.
--This will select ID 1 twice, 2 once, 3 once, and miss 4 completely
SELECT
*
FROM
#STKA
LEFT JOIN #METHDM ON #METHDM.AQPART = #STKA.V6PART
LEFT JOIN #SERI ON #SERI.HTPART = #STKA.V6PART
LEFT JOIN #OCRI ON #OCRI.DDPART = #STKA.V6PART;
So the problem here is that we don't have every ID in our "anchor" table STKA, and in fact there's no single table that has every ID in it. Now your data might be fine here, but if it isn't then you can simply add a step to find every ID, and use this as the anchor.
--This will select each ID, but still doubles up on ID 1
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN #METHDM ON #METHDM.AQPART = I.Id
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
That's using a common-table expression, but a subquery would also do the job. However, this still leaves us with an issue where ID 1 appears twice in the list, because it has multiple rows in one of the sub-tables.
One way to fix this is to pick the row with the latest date, or any other ORDER you can apply to the data:
--Pick the best row for the table where it has multiple rows, now we get one row per ID
WITH Ids AS (
SELECT V6PART AS ID FROM #STKA
UNION
SELECT AQPART AS ID FROM #METHDM
UNION
SELECT HTPART AS ID FROM #SERI
UNION
SELECT DDPART AS ID FROM #OCRI),
BestMETHDM AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY AQPART ORDER BY SOME_DATE DESC) AS ORDER_ID
FROM
#METHDM)
SELECT
*
FROM
Ids I
LEFT JOIN #STKA ON #STKA.V6PART = I.Id
LEFT JOIN BestMETHDM ON BestMETHDM.AQPART = I.Id AND BestMETHDM.ORDER_ID = 1
LEFT JOIN #SERI ON #SERI.HTPART = I.Id
LEFT JOIN #OCRI ON #OCRI.DDPART = I.Id;
Of course you could also add some aggregation (SUM, MAX, MIN, AVG, etc.) to fix this problem (if it is indeed an issue). Also, I used a common-table expression, but this would work just as well with a subquery.
Expanding on a comment made on the question..
I would say I will start with SERI as that table contains the entire inventory for our facility and should cover the other tables
However the question said
SERI is our inventory table with 37180 records. (active inventory that is available)
In my experience, active inventory, isn't the same as all parts.
Normally, in a query like this, I'd expect the first table to be a Parts Master table of some sort that contains every possible part ID.

How to work in case in join condition

How to find city when ContactID is provided and condition is if ContactID is coming as 123 then it will look whether it is P or C, If P then it will go to Person table and returns City(USA) as output and If C then it will go to Company table and gives City(AUS) as output.
NB: all tables contain thousands of record and City value comes from run time.
Unless you're dynamically generating the query (i.e. using some language other than SQL to execute it) then you need to join on both tables anyway. If you're joining on both tables then there's no need for a CASE statement:
select *
from contacts co
left outer join person p
on co.contactid = p.contactid
and co.person_company = 'P'
left outer join company c
on co.contactid = c.contactid
and co.person_company = 'C'
You'll start noting an issue here, for every column from PERSON and COMPANY you're going to have to add some business logic to work out which table you want the information from. This can get very tiresome
select co.contactid
, case when p.id is not null then p.name else c.name end as name
from contacts co
left outer join person p
on co.contactid = p.contactid
and co.person_company = 'P'
left outer join company c
on co.contactid = c.contactid
and co.person_company = 'C'
Your PERSON and COMPANY tables seem to have exactly the same information in them. If this is true in your actual data model then there's no need to split them up. You make the determination as to whether each entity is a person or a company in your CONTACTS table.
Creating additional tables to store data in this manner is only really helpful if you need to store additional data. Even then, I'd still put the data that means the same thing for a person or a companny (i.e. name or address) in a single table.
If there's a 1-2-1 relationship between CONTACTID and PID and CONTACTID and CID, which is what your sample data implies, then you have a number of additional IDs, which have no value.
Lastly, if you're not restricting that only companies can go in the COMPANY table and individuals in the PERSON table. You need the PERSON_COMPANY column to exist in both PERSON and COMPANY, though as a fixed string. It would be more normal to set up this data model as something like the following:
create table contacts (
id integer not null
, contact_type char(1) not null
, name varchar2(4000) not null
, city varchar2(3)
, constraint pk_contacts primary key (id)
, constraints uk_contacts unique (id, contact_type)
);
create table people (
id integer not null
, contact_type char(1) not null
, some_extra_info varchar2(4000)
, constraint pk_people primary key (id)
, constraint fk_people_contacts
foreign key (id, contact_type)
references contacts (id, contact_type)
, constraint chk_people_type check (contact_type = 'P')
);
etc.
you can LEFT JOIN all 3 tables and the using a CASE statement select the one that you need based on the P or C value
SELECT
CASE c.[Person/Company]
WHEN 'P' THEN p.NAME
WHEN 'C' THEN a.Name
END AS Name
FROM Contact c
LEFT JOIN Person p on p.ContactId = c.ContactId
LEFT JOIN Company a on a.ContachId = c.ContactId
Ben's answer is almost right. You might want to check that the first join has no match before doing the second one:
select c.*, coalesce(p.name, c.name) as p.name
from contacts c left outer join
person p
on c.contactid = p.contactid and
c.person_company = 'P' left join
company co
on c.contactid = co.contactid and
c.person_company = 'C' and
p.contactid is null;
This may not be important in your case. But in the event that the second join matches multiple rows and the first matches a single row, you might not want the additional rows in the output.

SQL Exclude results from join or subquery where column names don't match up with multiple tables

Ok, I'm working with an existing database (cannot edit the tables, columns, etc.) I am asked to create a report with the data for clients. This worked fine until we needed to create an exclusions group for certain clients.
There are 8 tables that need to be parsed for information in the database to execute this query properly. I've "simplified" it to 6 tables as best as I can.
The tables below are the existing tables, in the best order that I could come up with.
Table: Clients
ClientKey ClientNo ClientName
1 12345 ABC
2 12346 DEF
3 12347 GHI
4 12348 JKL
5 12349 MNO
6 12350 PQR
Table: ClientGroup
ClientKey GroupCode GroupValue
12345 EXCLUSIONSGROUP EXCLUDE
12346 EXCLUSIONSGROUP EXCLUDE
12347 OTHERSTUFF SOMETHING
Table: Groups
GroupCode GroupCodeKey
EXCLUSIONSGROUP 25
OTHERSTUFF 14
Table: GroupValues
GroupCode GroupValue
EXCLUSIONSGROUP EXCLUDE
OTHERSTUFF SOMETHING
EXCLUSIONSGROUP SOMETHING
Table: Images
FileKey Filename
987654 NULL
987653 Filename.jpg
987652 Filename.jpg
987651 NULL
987650 NULL
Table: Files
FileKey ClientKey
987654 12345
987653 12345
987652 12346
987651 12347
987650 12347
To better explain these tables:
Clients holds our clients
ClientGroup holds a list of which clients belong to which groups and the value that this client was assigned in that group (clients can be assigned multiple groups, and/or multiple values for a group)
Groups holds a list of the groups that exist as well as the GroupCodeKey. This table is important to refer to because the GroupCode values can change, so referring to '25' for example is the best way to access the proper GroupCode
GroupValues holds a list of all the possible GroupValues that can be assigned to a GroupCode (Group). They may be added, removed, changed.
Images points to the Files table through the FileKey column which points to the Clients table through the ClientKey column. The Images table tells us if a client's file has an image or not (defined by NULL if it does not exist)
Files contains a list of all documents that belong to a client. A client can (and most likely will) have multiple documents/files.
What I need to do:
I need to find all instances where the Filename in the Images table is NULL and where the Client is NOT in the ClientGroup table with a GroupValue of GroupValues(table)GroupValue(column) equal to 'EXCLUDE' and in the GroupCode of Groups(table)GroupCode(column) equal to Groups(table)GroupCodeKey(column) of '25'
In the following code, ignore columns that are unseen in the tables above, they exist, however to simplify the code and tables above, I've removed them from the code. They are still relevant to mention in the code below under select as both queries pull different columns' information from the database which prevents me from doing an EXCEPT between both queries
The current code (simplified) I have to get all the Clients with NULL Images is:
SELECT f.FileKey AS fkey, f.fNo AS fno, f.fDate AS fdate, cli.ClientNo AS clientno, cli.ClientName AS clientname, /*OTHER TABLE STUFF*/
FROM Files as f
LEFT JOIN Images as img
ON f.FileKey=img.FileKey
/*OTHER LEFT JOINS AND TABLES HERE RETURNING OTHER DATA*/
WHERE
img.[Filename] IS NULL
ORDER BY f.FileKey DESC
The current code I have to get all the Clients that are in the group with a GroupCodeKey of '25' and with a GroupValue of 'EXCEPT' is:
SELECT cli.ClientNo AS clientno, cli.ClientName AS clientname
FROM Clients AS cli
LEFT JOIN ClientGroup AS cg
ON cg.ClientKey = cli.ClientKey
LEFT JOIN Groups AS gc
ON gc.GroupCode = cg.GroupCode
LEFT JOIN GroupValue AS gv
ON gv.GroupCode = gc.GroupCode
WHERE
gc.GroupCodeKey='25' AND
gv.GroupValue='EXCLUDE'
Both above queries work exactly as anticipated on their own.
How would I combine these queries to give me the desired output?
The desired output (according to the tables above and their contents) would be to have the information that matches the first query minus the second one:
ClientNo:123457
ClientName:GHI
FileKey:987651
AND
ClientNo:123457
ClientName:GHI
FileKey:987650
Both these results match a client not belonging to the exceptions group '25' with value of 'EXCLUDE' and both FileKeys (987651 and 987650) have Filename set to NULL
I have tried to join all the tables, but cannot seem to properly create the query (I get either no results or I get results for the clients in the exception group only - whereas I need the ones not in the exceptions group). I have also tried creating a subquery, but I couldn't seem to get that to work either...
Any help regarding this is much appreciated.
Thanks!
Not sure if I have all the criteria right, but the general form of what you want can be gotten using NOT EXISTS
SELECT clientno, clientname, fkey, /*OTHER TABLE STUFF*/
FROM (
SELECT f.FileKey AS fkey, f.fNo AS fno, f.fDate AS fdate, cli.ClientNo AS clientno, cli.ClientName AS clientname, /*OTHER TABLE STUFF*/
FROM Files AS f
LEFT JOIN Images AS img
ON f.FileKey=img.FileKey
/*OTHER LEFT JOINS AND TABLES HERE RETURNING OTHER DATA*/
WHERE
img.[Filename] IS NULL
) incl
WHERE NOT EXISTS (
SELECT * FROM(
SELECT cli.ClientNo AS clientno, cli.ClientName AS clientname
FROM Clients AS cli
LEFT JOIN ClientGroup AS cg
ON cg.ClientKey = cli.ClientKey
LEFT JOIN Groups AS gc
ON gc.GroupCode = cg.GroupCode
LEFT JOIN GroupValue AS gv
ON gv.GroupCode = gc.GroupCode
WHERE
gc.GroupCodeKey='25' AND
gv.GroupValue='EXCLUDE'
) excl
WHERE excl.clientno = incl.ClientNo
AND excl.clientname = incl.ClientName
)
ORDER BY fkey DESC
SELECT im.FileKey
FROM Images AS im
INNER JOIN Files as Fi ON im.filekey = fi.filekey
--NOW THAT WE KNOW ALL CLIENT KEYS THAT MEET THE REQUIREMENTS WE NEED TO GO GET THE FILEKEY
INNER JOIN (
-- FIND ALL IMAGES WHERE FILE IS NOT NULL
SELECT F.clientKey
FROM images AS i
INNER JOIN files AS F on f.fileKey = i.fileKey
WHERE i.filename IS NULL
INTERSECT--GRAB THE INTERSECT BECAUSE WE WANT TO KNOW ALL CLIENTKEYS THAT MEET THE BELOW REQUIREMENTS
--FIND ALL CLIENTS THAT ARE NOT IN EXCLUDED AND IN THE GROUPS TABLE WITH GROUPCODE =25
SELECT C.ClientKey
FROM CLIENTS AS C
INNER JOIN CLientGroup AS CG ON C.clientKey = cg.clientkey AND cg.groupvalue != 'EXCLUDE'
INNER JOIN Groups AS g on CG.groupCode = g.groupCode AND g.groupCodeKey = '25') AS x on x.clientkey = fi.clientkey
I believe this will get you what you want(un tested)

SQL joins with multiple records into one with a default

My 'people' table has one row per person, and that person has a division (not unique) and a company (not unique).
I need to join people to p_features, c_features, d_features on:
people.person=p_features.num_value
people.division=d_features.num_value
people.company=c_features.num_value
... in a way that if there is a record match in p_features/d_features/c_features only, it would be returned, but if it was in 2 or 3 of the tables, the most specific record would be returned.
From my test data below, for example, query for person=1 would return
'FALSE'
person 3 returns maybe, person 4 returns true, and person 9 returns default
The biggest issue is that there are 100 features and I have queries that need to return all of them in one row. My previous attempt was a function which queried on feature,num_value in each table and did a foreach, but 100 features * 4 tables meant 400 reads and it brought the database to a halt it was so slow when I loaded up a few million rows of data.
create table p_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table c_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table d_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table default_features (
feature varchar(20),
feature_value varchar(128)
);
create table people (
person int8 not null,
division int8 not null,
company int8 not null
);
insert into people values (4,5,6);
insert into people values (3,5,6);
insert into people values (1,2,6);
insert into p_features values (4,'WEARING PANTS','TRUE');
insert into c_features values (6,'WEARING PANTS','FALSE');
insert into d_features values (5,'WEARING PANTS','MAYBE');
insert into default_features values('WEARING PANTS','DEFAULT');
You need to transpose the features into rows with a ranking. Here I used a common-table expression. If your database product does not support them, you can use temporary tables to achieve the same effect.
;With RankedFeatures As
(
Select 1 As FeatureRank, P.person, PF.feature, PF.feature_value
From people As P
Join p_features As PF
On PF.num_value = P.person
Union All
Select 2, P.person, PF.feature, PF.feature_value
From people As P
Join d_features As PF
On PF.num_value = P.division
Union All
Select 3, P.person, PF.feature, PF.feature_value
From people As P
Join c_features As PF
On PF.num_value = P.company
Union All
Select 4, P.person, DF.feature, DF.feature_value
From people As P
Cross Join default_features As DF
)
, HighestRankedFeature As
(
Select Min(FeatureRank) As FeatureRank, person
From RankedFeatures
Group By person
)
Select RF.person, RF.FeatureRank, RF.feature, RF.feature_value
From people As P
Join HighestRankedFeature As HRF
On HRF.person = P.person
Join RankedFeatures As RF
On RF.FeatureRank = HRF.FeatureRank
And RF.person = P.person
Order By P.person
I don't know if I had understood very well your question, but to use JOIN, you need your table loaded already and then use the SELECT statement with INNER JOIN, LEFT JOIN or whatever you need to show.
If you post some more information, maybe turn it easy to understand.
There are some aspects of your schema I'm not understanding, like how to relate to the default_features table if there's no match in any of the specific tables. The only possible join condition is on feature, but if there's no match in the other 3 tables, there's no value to join on. So, in my example, I've hard-coded the DEFAULT since I can't think of how else to get it.
Hopefully this can get you started and if you can clarify the model a bit more, the solution can be refined.
select p.person, coalesce(pf.feature_value, df.feature_value, cf.feature_value, 'DEFAULT')
from people p
left join p_features pf
on p.person = pf.num_value
left join d_features df
on p.division = df.num_value
left join c_features cf
on p.company = cf.num_value