JOINS and absentee values - sql

INSERT INTO Shipments (Column1...Column200)
SELECT
O.Value1,...
CL.Value199,
isnull(P.PriceFactor1,1)
FROM Orders O
JOIN Clients CL on O.ClientNo = CL.ClientNo
JOIN Calc C on CL.CalcCode = C.CalcCode
JOIN Prices P on CL.PriceKey = P.PriceKey
WHERE O.PriceFactor1 = P.PriceFactor1
AND O.PriceFactor2 = P.PriceFactor2
AND O.PriceFactor3 = P.PriceFactor3
The above query (part of a new stored procedure meant to replace an old and nasty one that used a cursor...) fails to return some rows, because the rows in Orders do not have matching rows in Prices. In such cases, we want the last value in the INSERT list to be 1 by default. Instead, the row is never built; or, when we tried to fix it by changing the WHERE conditions, it brought PriceFactor1 from a different row, which is also no good.
How it's supposed to work:
A row is created in table Orders. A third-party program then executes a stored procedure (Asp_BuildShipments) and displays the results once they have been inserted into table Shipments. This SP is meant to populate the table Shipments by pulling values from Orders, Clients, Drivers, Vehicles, Prices, Routes, and others. It's a long SP, and the array of tables is big and varied.
In table Orders:
PriceFactor1 | PriceFactor2 | PriceFactor3
12 | 10 | 8
In table Prices:
PriceFactor1 | PriceFactor2 | PriceFactor3
18 | 12 | 10
In a case such as this, the SP needs to recognize that no such rows exist in Prices and use a default value of 1 rather than skipping the row or pulling the price from a different row.
We've tried isnull(), CASE statements, and WHERE EXISTS, but to no avail.
The new SP is set based, and we want to leave it that way - the old one took minutes, the new one takes only a few seconds. But without passing row by row, we aren't sure how to check each individual Order to see if has a matching Price before building the row in Shipments.
I know there are details missing here, but I didn't want to write a 1,000 page question. If these details are insufficient, I'll post as much as I need to to help get your brains storming. Been stuck on this for a while now...
Thanks in advance :)

Could this be as simple as:
SELECT
O.Value1,...
CL.Value199,
isnull(P.PriceFactor1,1)
FROM Orders O
JOIN Clients CL on O.ClientNo = CL.ClientNo
JOIN Calc C on CL.CalcCode = C.CalcCode
LEFT JOIN Prices P on CL.PriceKey = P.PriceKey
AND O.PriceFactor1 = P.PriceFactor1
AND O.PriceFactor2 = P.PriceFactor2
AND O.PriceFactor3 = P.PriceFactor3

Richard Hansell's answer should work if the problem is as you suggest there are no matching rows in Prices. Since it's not working the problem is in the other joins.
One of these joins filters out data.
JOIN Clients CL on O.ClientNo = CL.ClientNo
JOIN Calc C on CL.CalcCode = C.CalcCode

Related

SQL - join three tables based on (different) latest dates in two of them

Using Oracle SQL Developer, I have three tables with some common data that I need to join.
Appreciate any help on this!
Please refer to https://i.stack.imgur.com/f37Jh.png for the input and desired output (table formatting doesn't work on all tables).
These tables are made up in order to anonymize them, and in reality contain other data with millions of entries, but you could think of them as representing:
Product = Main product categories in a grocery store.
Subproduct = Subcategory products to the above. Each time the table is updated, the main product category may loses or get some new suproducts assigned to it. E.g. you can see that from May to June the Pulled pork entered while the Fishsoup was thrown out.
Issues = Status of the products, for example an apple is bad if it has brown spots on it..
What I need to find is: for each P_NAME, find the latest updated set of subproducts (SP_ID and SP_NAME), and append that information with the latest updated issue status (STATUS_FLAG).
Please note that each main product category gets its set of subproducts updated at individual occasions i.e. 1234 and 5678 might be "latest updated" on different dates.
I have tried multiple queries but failed each time. I am using combos of SELECT, LEFT OUTER JOIN, JOIN, MAX and GROUP BY.
Latest attempt, which gives me the combo of the first two tables, but missing the third:
SELECT
PRODUCT.P_NAME,
SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE
FROM SUBPRODUCT
LEFT OUTER JOIN PRODUCT ON PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
JOIN(SELECT SP_PRODUCT_ID, MAX(SP_VALUE_DATE) AS latestdate FROM SUBPRODUCT GROUP BY SP_PRODUCT_ID) sub ON
sub.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID AND sub.latestDate = SUBPRODUCT.SP_VALUE_DATE;
Trying to find a row with a max value is a common SQL pattern - you can do it with a join, like your example, but it's usually more clear to use a subquery or a window function.
Correlated subquery example
select
PRODUCT.P_NAME,
SUBPRODUCT.SP_PRODUCT_ID, SUBPRODUCT.SP_NAME, SUBPRODUCT.SP_ID, SUPPRODUCT.SP_VALUE_DATE,
ISSUES.STATUS_FLAG, ISSUES.STATUS_LAST_UPDATED
from PRODUCT
join SUBPRODUCT
on PRODUCT.P_ID = SUBPRODUCT.SP_PRODUCT_ID
and SUBPRODUCT.SP_VALUE_DATE = (select max(S2.SP_VALUE_DATE) as latestDate
from SUBPRODUCT S2
where S2.SP_PRODUCT_ID = SUBPRODUCT.SP_PRODUCT_ID)
join ISSUES
on ISSUES.ISSUE_ID = SUBPRODUCT.SP_ID
and ISSUES.STATUS_LAST_UPDATED = (select max(I2.STATUS_LAST_UPDATED) as latestDate
from ISSUES I2
where I2.ISSUE_ID = ISSUES.ISSUE_ID)
Window function / inline view example
select
PRODUCT.P_NAME,
S.SP_PRODUCT_ID, S.SP_NAME, S.SP_ID, S.SP_VALUE_DATE,
I.STATUS_FLAG, I.STATUS_LAST_UPDATED
from PRODUCT
join (select SUBPRODUCT.*,
max(SP_VALUE_DATE) over (partition by SP_PRODUCT_ID) as latestDate
from SUBPRODUCT) S
on PRODUCT.P_ID = S.SP_PRODUCT_ID
and S.SP_VALUE_DATE = S.latestDate
join (select ISSUES.*,
max(STATUS_LAST_UPDATED) over (partition by ISSUE_ID) as latestDate
from ISSUES) I
on I.ISSUE_ID = S.SP_ID
and I.STATUS_LAST_UPDATED = I.latestDate
This often performs a bit better, but window functions can be tricky to understand.

SQL One to many relationships, duplicate/results being returned as unique rows

I have a database and some of the tables have one to many relationships. How do I eliminate the results being returned as its own unique row?
For instance I have a initiative table and a initiative can have many funding requirements. When I perform an inner join I'm getting the results but it looks like the rows are duplicating to output a unique value from the funding table.
From the results, it should be like this
Row 3,4,5 should be in one row listing the results with the the funding required
Description | Acad_priority_1 | Acad_priority_2 | beginning_fiscal_year |
Develop... | false | true | 2018/2019 |
| 2018/2019 |
| 2019/2020
Can you please steer me in the right direction or show me how the SQL should be structured to achieve this?
SQL:
SELECT plan_master.plan_id,
plan_master.date_submitted,
plan_master.filename,
initiative_master.plan_id,
initiative_master.NAME,
initiative_master.acad_priority_1,
funding.initiative_id,
funding.beginning_fiscal_year
FROM plan_master
JOIN initiative_master
ON plan_master.plan_id = initiative_master.plan_id
JOIN funding
ON initiative_master.initiative_id = funding.initiative_id
ORDER BY Filename
|plan_id|date_submitted|filename|plan_id|NAME|acad_priority_1|initiative_id|begginning_fiscal_year|
|16F44FFE-5434-4E52-9D9A-F45C0A49D8E2|2018-12-03|1.txt|16F44FFE-5434-4E52-9D9A-F45C0A49D8E2|Space Utilization framework|false|8CCE0311-0E3C-467D-B675-04817A473056|2018/2019
|16F44FFE-5434-4E52-9D9A-F45C0A49D8E2|2018-12-03|1.txt|16F44FFE-5434-4E52-9D9A-F45C0A49D8E2|Space Utilization framework|false|8CCE0311-0E3C-467D-B675-04817A473056|2019/2020
|16F44FFE-5434-4E52-9D9A-F45C0A49D8E2|2018-12-03|1.txt|16F44FFE-5434-4E52-9D9A-F45C0A49D8E2|Space Utilization framework|false|8CCE0311-0E3C-467D-B675-04817A473056|2020/2021
The 2 beginning fiscal year values cannot be combined into a single row unless you want to concatenate them with commas (or another separator), or write a function to show the value as a range, for example 2018-2020. You can however get rid of the 4th record using distinct or using the below mentioned over/partition by clauses.
If you don't mind, can you run the following query and provide the results that will help me identify the duplication issue:
SELECT plan_master.plan_id,
plan_master.date_submitted,
plan_master.filename,
plan_master.department,
plan_master.last_name,
plan_master.first_name,
plan_master.email,
plan_master.mission_statement,
plan_master.vision_statement,
plan_master.goals_objectives,
initiative_master.plan_id,
initiative_master.NAME,
initiative_master.description,
initiative_master.acad_priority_1,
initiative_master.acad_priority_2,
initiative_master.acad_priority_3,
initiative_master.acad_priority_4,
initiative_master.acad_priority_5,
initiative_master.acad_priority_6,
initiative_master.operational_sustainability,
initiative_master.people_plan,
funding.initiative_id,
funding.beginning_fiscal_year
FROM plan_master
JOIN initiative_master
ON plan_master.plan_id = initiative_master.plan_id
JOIN funding
ON initiative_master.initiative_id = funding.initiative_id
ORDER BY Filename
Once you get to the cause, you can either use a better join clause (multiple conditions), add a where clause, or use the OVER clause in conjunction with the PARTITION BY clause to filter the data based on a ROW_NUMBER().
A simple join returns a cartesion product. If one table has 2 rows and another has 3, then there will be 6 rows of data. Need to do distinct on the data. You can do this:
SELECT plan.date_submitted,
plan.filename,
plan.department,
plan.last_name,
plan.first_name,
plan.email,
plan.mission_statement,
plan.vision_statement,
plan.goals_objectives,
initiative.Name,
initiative.description,
initiative.acad_priority_1,
initiative.acad_priority_2,
initiative.acad_priority_3,
initiative.acad_priority_4,
initiative.acad_priority_5,
initiative.acad_priority_6
FROM plan_master as plan
inner join (select distinct init.plan_id, init.NAME,
init.description,
init.acad_priority_1,
init.acad_priority_2,
init.acad_priority_3,
init.acad_priority_4,
init.acad_priority_5,
init.acad_priority_6,
init.operational_sustainability,
init.people_plan,
funding.beginning_fiscal_year from initiative_master as init
join funding on funding.initiative_id = init.initiative_id ) as initiative
ON plan.plan_id = initiative.plan_id
ORDER BY Filename

Defaulting missing data

I have a complex set of schema that I am trying to pull data out of for a report. The query for it joins a bunch of tables together and I am specifically looking to pull a subset of data where everything for it might be null. The original relations for the tables look as such.
Location.DeptFK
Dept.PK
Section.DeptFK
Subsection.SectionFK
Question.SubsectionFK
Answer.QuestionFK, SubmissionFK
Submission.PK, LocationFK
From here my problems begin to compound a little.
SELECT Section.StepNumber + '-' + Question.QuestionNumber AS QuestionNumberVar,
Question.Question,
Subsection.Name AS Subsection,
Section.Name AS Section,
SUM(CASE WHEN (Answer.Answer = 0) THEN 1 ELSE 0 END) AS NA,
SUM(CASE WHEN (Answer.Answer = 1) THEN 1 ELSE 0 END) AS AnsNo,
SUM(CASE WHEN (Answer.Answer = 2) THEN 1 ELSE 0 END) AS AnsYes,
(select count(distinct Location.Abbreviation) from Department inner join Plant on location.DepartmentFK = Department.PK WHERE(Department.Name = 'insertParameter'))
as total
FROM Department inner join
section on Department.PK = section.DepartmentFK inner JOIN
subsection on Subsection.SectionFK = Section.PK INNER JOIN
question on Question.SubsectionFK = Subsection.PK INNER JOIN
Answer on Answer.QuestionFK = question.PK inner JOIN
Submission on Submission.PK = Answer.SubmissionFK inner join
Location on Location.DepartmentFK = Department.PK AND Location.pk = Submission.PlantFK
WHERE (Department.Name = 'InsertParameter') AND (Submission.MonthTested = '1/1/2017')
GROUP BY Question.Question, QuestionNumberVar, Subsection.Name, Section.Name, Section.StepNumber
ORDER BY QuestionNumberVar;
There are 15 total locations, with this query I get 12. If I remove a relation in the join for Location I get 15 total locations but my answer data gets multiplied by 15. My issue is that not all locations are required to test at the same time so their answers should default to NA, They don't get records placed in the DB so the relationship between Location/Submission is absent.
I have a workaround almost in place via the select count distinct but, The second part is a query for finding what each location answered instead of a sum which brings the problem right back around. It also has to be dynamic because the input parameters for a department won't bring a static number of locations back each time.
I am still learning my SQL so any additional material to look at for building this query would also be appreciated. So I guess the big question here is, How would I go about creating default data in this query for anytime the Location/Submission relation has a null value?
Edit: Dummy Data
QuestionNumberVar | Section | Subsection | Question | AnsYes | AnsNo | NA (expected)
1-1.1 Math Algebra Did you do your homework? 10 1 1(4)
1-1.2 Math Algebra Did your dog eat it? 9 3 0(3)
2-1.1 English Greek Did you do your homework? 8 0 4(7)
I have tried making left joins at various applicable portions of the code to no avail. All attempts at left joins have ended with no effect on info output. This query feeds into the Dataset for an SSRS report. There are a couple workarounds for this particular section via an expression to take total Locations and subtract AnsYes and AnsNo to get the true NA value but as explained above doesn't help with my next query.
Edit: SQL Server 2012 for those who asked
Edit: my attempt at an isnull() on the missing data returns nothing I suspect because the query already eliminates the "null/missing" data. Left joining while doing this has also failed. The point of failure is on Submissions. if we bind it to Locations there are locations missing but if we don't bind it there are multiplied duplicates because Department has a One-To-Many with Location and not vice versa. I am unable to make any schema changes to improve this process.
There is a previous report that I am trying to emulate/update. It used C# logic to process data and run multiple queries to attain the same data. I don't have this luxury. (previous report exports to excel directly instead of SSRS). Here is the previous logic used.
select PK from Department where Name = 'InsertParameter';
select PK from Submission where LocationFK = 'Location.PK_var' and MonthTested = '1/1/2017'
Then it runs those into a loop where it processes nulls into NA using C# logic
EDIT (Mediocre Solution): I ended up doing the workaround of making a calculated field that subtracts Yes and No from the total # of Locations that have that Dept. This is a mediocre solution because I didn't solve my original problem and made 3 datasets that should have been displayed as a singular dataset. One for question info, one for each locations answer and one for locations that didnt participate. If a true answer comes up I will check its validity but for now, Problem psuedo solved.

how to join multiple tables without showing repeated data?

I pop into a problem recently, and Im sure its because of how I Join them.
this is my code:
select LP_Pending_Info.Service_Order,
LP_Pending_Info.Pending_Days,
LP_Pending_Info.Service_Type,
LP_Pending_Info.ASC_Code,
LP_Pending_Info.Model,
LP_Pending_Info.IN_OUT_WTY,
LP_Part_Codes.PartCode,
LP_PS_Codes.PS,
LP_Confirmation_Codes.SO_NO,
LP_Pending_Info.Engineer_Code
from LP_Pending_Info
join LP_Part_Codes
on LP_Pending_Info.Service_order = LP_Part_Codes.Service_order
join LP_PS_Codes
on LP_Pending_Info.Service_Order = LP_PS_Codes.Service_Order
join LP_Confirmation_Codes
on LP_Pending_Info.Service_Order = LP_Confirmation_Codes.Service_Order
order by LP_Pending_Info.Service_order, LP_Part_Codes.PartCode;
For every service order I have 5 part code maximum.
If the service order have only one value it show the result correctly but when it have more than one Part code the problem begin.
for example: this service order"4182134076" has only 2 part code, first'GH81-13601A' and second 'GH96-09938A' so it should show the data 2 time but it repeat it for 8 time. what seems to be the problem?
If your records were exactly the same the distinct keyword would have solved it.
However in rows 2 and 3 which have the same Service_Order and Part_Code if you check the SO_NO you see it is different - that is why distinct won't work here - the rows are not identical.
I say you have some problem in one of the conditions in your joins. The different data is in the SO_NO column so check the raw data in the LP_Confirmation_Codes table for that Service_Order:
select * from LP_Confirmation_Codes where Service_Order = 4182134076
I assume you are missing an and with the value from the LP_Part_Codes or LP_PS_Codes (but can't be sure without seeing those tables and data myself).
By this sentence If the service order have only one value it show the result correctly but when it have more than one Part code the problem begin. - probably you are missing and and with the LP_Part_Codes table
Based on your output result, here are the following data that caused multiple output.
Service Order: 4182134076 has :
2 PartCode which are GH81-13601A and GH96-09938A
2 PS which are U and P
2 SO_NO which are 1.00024e+09 and 1.00022e+09
Therefore 2^3 returns 8 rows. I believe that you need to check where you should join your tables.
Use DINTINCT
select distinct LP_Pending_Info.Service_Order,LP_Pending_Info.Pending_Days,
LP_Pending_Info.Service_Type,LP_Pending_Info.ASC_Code,LP_Pending_Info.Model,
LP_Pending_Info.IN_OUT_WTY, LP_Part_Codes.PartCode,LP_PS_Codes.PS,
LP_Confirmation_Codes.SO_NO,LP_Pending_Info.Engineer_Code
from LP_Pending_Info
join LP_Part_Codes on LP_Pending_Info.Service_order = LP_Part_Codes.Service_order
join LP_PS_Codes on LP_Part_Codes.Service_Order = LP_PS_Codes.Service_Order
join LP_Confirmation_Codes on LP_PS_Codes.Service_Order = LP_Confirmation_Codes.Service_Order
order by LP_Pending_Info.Service_order, LP_Part_Codes.PartCode;
distinct will not return duplicates based on your select. So if a row is same, it will only return once.

How to make a query to obtain only results that have N number within a range of values?

I'm trying to extract nutrient data in MS Access 2007 from the USDA food database, freely available at http://www.ars.usda.gov/Services/docs.htm?docid=24912
I need records that have ALL nutrients from NUT_DATA.Nutr_No . Those records have values between '501' and '511' . But I wish to exclude incomplete records that have missing values.
Currently, Baby food banana has all from nutrient 501 to 511, but Baby food Beverage has only 9 of the nutrients listed, and many others are like that.
As a last resort, I guess it would be acceptable to have all records, showing null for missing values, as long as each FOOD_DES.Long_Desc has exactly 11 records, one for each NUT_DATA.Nutr_No OR NUTR_DEF.NutrDesc (which correspond to each other).
SELECT
FOOD_DES.NDB_No, FOOD_DES.FdGrp_Cd, FOOD_DES.Long_Desc, NUT_DATA.Nutr_No, NUTR_DEF.NutrDesc, NUT_DATA.Nutr_Val, WEIGHT.Amount, WEIGHT.Msre_Desc, WEIGHT.Gm_Wgt, [WEIGHT]![Amount] & " " & [WEIGHT]![Msre_Desc] AS msre
FROM
NUTR_DEF inner JOIN ((FOOD_DES INNER JOIN NUT_DATA ON FOOD_DES.NDB_No=NUT_DATA.NDB_No) INNER JOIN WEIGHT ON FOOD_DES.NDB_No=WEIGHT.NDB_No) ON NUTR_DEF.Nutr_No=NUT_DATA.Nutr_No
WHERE
(NUT_DATA.Nutr_No between '501' and '511' ) and ((WEIGHT.Seq)="1") and NUT_DATA.Nutr_Val > '0' and
// this part is me out of ideas trying stuff, but didn't help
EXISTS (SELECT 1
FROM
NUTR_DEF inner JOIN ((FOOD_DES INNER JOIN NUT_DATA ON FOOD_DES.NDB_No=NUT_DATA.NDB_No) INNER JOIN WEIGHT ON FOOD_DES.NDB_No=WEIGHT.NDB_No) ON NUTR_DEF.Nutr_No=NUT_DATA.Nutr_No
WHERE count FOOD_DES.Long_Desc = "11" )
//end wild of experimentation
ORDER BY FOOD_DES.Long_Desc, NUTR_DEF.SR_Order;
This is a sample of the data. I just copied the most important columns. The red is not what I'm looking for because it doesn't have all 11 nutrients. I can paste on the google doc the whole table if someone thinks that would help.
https://docs.google.com/spreadsheets/d/1FghDD59wy2PYlpsqUlYVc3Ulwvy4MMLagpBUYtvLBfI/edit?usp=sharing
As your starting point, identify which food items have values > 0 for all 11 of those nutrients. Check whether this simpler GROUP BY query shows you the correct items:
SELECT ndat.NDB_No
FROM
NUT_DATA AS ndat
INNER JOIN WEIGHT AS wt
ON ndat.NDB_No = wt.NDB_No
WHERE
ndat.Nutr_Val>0
AND ndat.Nutr_No IN('501','502','503','504','505','506','507','508','509','510','511')
AND wt.Seq='1'
GROUP BY ndat.NDB_No
HAVING Count(ndat.Nutr_No)=11;
Note you could use Val(ndat.Nutr_No) Between 501 And 511 as the Nutr_No restriction, which would give you a more concise statement. However, evaluating Val() for every row of the table means that approach would forego the performance benefit of indexed retrieval ... so that version of the query should be noticeably slower.
Save that query and create a new query which joins it to the base tables for the additional data you need from other columns. Or use it as a subquery instead of a named query if you prefer.