How do I INSERT INTO where many fields have their own Select Statements? - sql

I created a table and i am in the process of inserting rows from another table into it. However, some of these rows require joins from other tables. To my knowledge, this means using a subquery select statement in the statement. the problem is subqueries only return one result, where i may have many. I am wanting to return a -1 where no records exists. Here is an example i am using but it is not working:
INSERT INTO [BDW_ReportPrototype].[dbo].[CustomerCreditFact]
( [MortgageDimID]
,[LeaseDimID]
,[OREODimID]
,[OfficerTypeDimID] )
SELECT
--[MortgageDimID]
-2
--LeaseDimID
,-2
--OREODimID
,-2
,CASE WHEN OfficerTypeDimID IS NULL THEN -1 ELSE OfficerTypeDimID END
FROM Staging_FDB_LN_CPDM_Daily LCD
LEFT OUTER JOIN ERMA..OfficerTypeDim OTD on OTD.OfficerNum = LCD.OFFICER
FROM dbo.Staging_FDB_LN_CPDM_Daily

Try this sql statement
SELECT CASE WHEN OfficerTypeDimID IS NULL THEN -1 ELSE OfficerTypeDimID END
FROM Staging_FDB_LN_CPDM_Daily LCD
LEFT OUTER JOIN ERMA..OfficerTypeDim OTD on OTD.OfficerNum = LCD.OFFICER

I would rework your query like the following.
First of all, use a LEFT OUTER JOIN in your query instead of the subqueries. This type of join says a row might exist in the "other" table but it might not but I want a row back regardless.
Now that you know you'll have all your rows, you'll want to see if there is a value there or not. Use the shorthand and easier to maintain check via the coalesce function. It basically is a list of values (column names, variables or hard coded values) and the optimizer will pick the first non-null value from the list and use it. Here we supply -1 for your query
INSERT INTO
[BDW_ReportPrototype].[dbo].[CustomerCreditFact]
(
[OfficerTypeDimID]
)
SELECT
-- coalesce returns the first non-null value
COALESCE(OTD.OfficerTypeDimID, -1) AS OfficerTypeDimID
FROM
dbo.Staging_FDB_LN_CPDM_Daily LCD
LEFT OUTER JOIN
ERMA..OfficerTypeDim OTD
ON OTD.OfficerNum = LCD.OFFICER

maybe something along these lines...
INSERT INTO [BDW_ReportPrototype].[dbo].[CustomerCreditFact]
([OfficerTypeDimID])
Select OfficerTypeDimID
from ERMA..OfficerTypeDim OTD
inner JOIN Staging_FDB_LN_CPDM_Daily LCD
on OTD.OfficerNum = LCD.OFFICER
UNION ALL
SELECT -1
FROM dbo.Staging_FDB_LN_CPDM_Daily LCD
WHERE NOT EXISTS
(
Select OfficerTypeDimID from ERMA..OfficerTypeDim
OTD
WHERE
OTD.OfficerNum = LCD.OFFICER
)

Related

Change existing sql to left join only on first match

Adding back some original info for historical purposes as I thought simplifying would help but it didn't. We have this stored procedure, in this part it is selecting records from table A (calldetail_reporting_agents) and doing a left join on table B (Intx_Participant). Apparently there are duplicate rows in table B being pulled that we DON'T want. Is there any easy way to change this up to only pick the first match on table B? Or will I need to rewrite the whole thing?
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
LEFT JOIN Intx_Participant ON calldetail_reporting_agents.CallID = Intx_Participant.CallIDKey
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
Simpler version: how to change below to select only first matching row from table B:
SELECT columnA, columnB FROM TableA LEFT JOIN TableB ON someColumn
I changed to this per the first answer and all data seems to look exactly as expected now. Thank you to everyone for the quick and attentive help.
SELECT 'Agent Calls' AS CallType,
CallDate,
CallTime,
RemoteNumber,
DialedNumber,
RemoteName,
LocalUserId,
CallDurationSeconds,
Answered,
AnswerSpeed,
InvalidCall,
Intx_Participant.Duration
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM Intx_Participant ip
WHERE calldetail_reporting_agents.CallID = ip.CallIDKey
AND calldetail_reporting_agents.RemoteNumber = ip.ConnValue
AND ip.HowEnded = '9'
AND ip.Recorded = '0'
AND ip.Duration > 0
AND ip.Role = '1') Intx_Participant
WHERE DialedNumber IN ( SELECT DialedNumber
FROM #DialedNumbers )
AND ConnectedDate BETWEEN #LocStartDate AND #LocEndDate
AND (#LocQueue IS NULL OR AssignedWorkGroup = #LocQueue)
You can try to OUTER APPLY a subquery getting only one matching row.
...
FROM calldetail_reporting_agents
OUTER APPLY (SELECT TOP 1
*
FROM intx_Participant ip
WHERE ip.callidkey = calldetail_reporting_agents.callid) intx_participant
WHERE ...
You should add an ORDER BY in the subquery. Otherwise it isn't deterministic which row is taken as the first. Or maybe that's not an issue.

SQL Filling In Values From A Second Table

I came up with this query to fill in a missing field from a second table using a subquery.
I can not modify the original table
SELECT
CASE WHEN original.target_field IS NULL THEN
(SELECT fill_in.target_field FROM second.table fill_in
WHERE original.id = fill_in.id)
ELSE
original.target_field END AS myField
FROM
primary.table original
I was wondering if I was missing something and if there was a more performant way to do this?
You could use LEFT JOIN and COALESCE instead of correlated subquery:
SELECT COALESCE(original.target_field,fill_in.target_field) AS myField
FROM primary.table original
LEFT JOIN second.table fill_in
ON original.id = fill_in.id
It is always worth testing different methods. But your query should be fine with an appropriate index.
I would write it as:
SELECT (CASE WHEN o.target_field IS NULL
THEN (SELECT f.target_field
FROM second.table f
WHERE o.id = f.id
)
ELSE o.target_field
END) AS myField
FROM primary.table o;
You want an index on second.table(id, target_field). You would want the same index for the LEFT JOIN version.

SQL conditional for a field using multiple subqueries as cases

I am using Proc SQL, but this question should be relevant for all SQL variants. I am trying to populate a field BruceDPOtest with values from two subqueries with if the first query results in blanks--CASE WHEN BruceDPO = INPUT("", 8.) --it fills that blank with another subquery's BruceDPO value:
THEN (
SELECT SUM(PART_QTY) FROM RSCCParts LEFT JOIN DPO.DPO_PART_ORD_HST AS Total
ON RSCCParts.PartID = STRIP(Total.PART_NO_ID)
WHERE PUT(PROC_DT, YY.) LIKE '%2016%' GROUP BY PART_NO_ID) ELSE BruceDPO END
For example, the first query gives the following results;
Part DPO
1234 100
1235
The second subquery that references data that can populate the second row is run to get:
Part DPO
1234 100
1235 999
Here is the full code:
PROC SQL;
CREATE VIEW DPOMergeView AS(SELECT *,
CASE
WHEN BruceDPO = INPUT("", 8.) THEN (
SELECT SUM(PART_QTY) FROM RSCCParts LEFT JOIN DPO.DPO_PART_ORD_HST AS Total
ON RSCCParts.PartID = STRIP(Total.PART_NO_ID)
WHERE PUT(PROC_DT, YY.) LIKE '%2016%' GROUP BY PART_NO_ID)
ELSE BruceDPO
END
AS BruceDPOtest
FROM
RSCCParts
LEFT JOIN (SELECT RSCCParts.PartID AS BrucePartID, BruceDPO, Year
FROM RSCCParts
LEFT JOIN
(SELECT PART_NO_ID AS PartNumber, SUM(PART_QTY) AS BruceDPO, STRIP(YR) AS Year
FROM
DPO.DPO_PART_HST_MAIN
WHERE YR = '2016'
GROUP BY PartNumber, Year) AS FQuery
ON
RSCCParts.PartID = STRIP(FQuery.PartNumber)) AS B
ON RSCCParts.PartID = B.BrucePartID);
QUIT;
As I run this query, it gets stuck on DATA Step and after 30 minutes, I stopped the query. Am I doing this correctly? If there is a better way to do this please let me know!
Normally I avoid correlated subqueries in SQL since it just makes it feel like you are trying to process the data record by record instead of by combining sets. But if you did what to use syntax like
case when (x) then (sub query result) else variable_name end
then the subquery needs to return only one value. Your query
SELECT SUM(PART_QTY)
FROM RSCCParts LEFT JOIN DPO.DPO_PART_ORD_HST AS Total
ON RSCCParts.PartID = STRIP(Total.PART_NO_ID)
WHERE PUT(PROC_DT, YY.) LIKE '%2016%'
GROUP BY PART_NO_ID
looks like it will return multiple observations since you are using a GROUP BY clause.
Shouldn't that subquery look more like
SELECT SUM(Total.PART_QTY)
FROM DPO.DPO_PART_ORD_HST AS Total
WHERE RSCCParts.PartID = STRIP(Total.PART_NO_ID)
AND PUT(PROC_DT, YY.) LIKE '%2016%'
Your query has multiple references to RSCCPARTS table so you might need to introduce an alias to each so that you can clarify which one you want to use to get PARTID from to match to PART_NO_ID.

SQL query: Iterate over values in table and use them in subquery

I have a simple SQL table containing some values, for example:
id | value (table 'values')
----------
0 | 4
1 | 7
2 | 9
I want to iterate over these values, and use them in a query like so:
SELECT value[0], x1
FROM (some subquery where value[0] is used)
UNION
SELECT value[1], x2
FROM (some subquery where value[1] is used)
...
etc
In order to get a result set like this:
4 | x1
7 | x2
9 | x3
It has to be in SQL as it will actually represent a database view. Of course the real query is a lot more complicated, but I tried to simplify the question while keeping the essence as much as possible.
I think I have to select from values and join the subquery, but as the value should be used in the subquery I'm lost on how to accomplish this.
Edit: I oversimplified my question; in reality I want to have 2 rows from the subquery and not only one.
Edit 2: As suggested I'm posting the real query. I simplified it a bit to make it clearer, but it's a working query and the problem is there. Note that I have hardcoded the value '2' in this query two times. I want to replace that with values from a different table, in the example table above I would want a result set of the combined results of this query with 4, 7 and 9 as values instead of the currently hardcoded 2.
SELECT x.fantasycoach_id, SUM(round_points)
FROM (
SELECT DISTINCT fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
LEFT JOIN fantasyworld_fantasyformation AS ff ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
AND _rr.sequence <= 2 /* HARDCODED USE OF VALUE */
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
AND fpc.round_sequence = 2 /* HARDCODED USE OF VALUE */
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Edit 3: I'm using PostgreSQL.
SQL works with tables as a whole, which basically involves set operations. There is no explicit iteration, and generally no need for any. In particular, the most straightforward implementation of what you described would be this:
SELECT value, (some subquery where value is used) AS x
FROM values
Do note, however, that a correlated subquery such as that is very hard on query performance. Depending on the details of what you're trying to do, it may well be possible to structure it around a simple join, an uncorrelated subquery, or a similar, better-performing alternative.
Update:
In view of the update to the question indicating that the subquery is expected to yield multiple rows for each value in table values, contrary to the example results, it seems a better approach would be to just rewrite the subquery as the main query. If it does not already do so (and maybe even if it does) then it would join table values as another base table.
Update 2:
Given the real query now presented, this is how the values from table values could be incorporated into it:
SELECT x.fantasycoach_id, SUM(round_points) FROM
(
SELECT DISTINCT
fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
-- one row for each combination of coach and value:
CROSS JOIN values
LEFT JOIN fantasyworld_fantasyformation AS ff
ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr
ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff
ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
-- use the value obtained from values:
AND _rr.sequence <= values.value
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
-- use the value obtained from values again:
AND fpc.round_sequence = values.value
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Note in particular the CROSS JOIN which forms the cross product of two tables; this is the same thing as an INNER JOIN without any join predicate, and it can be written that way if desired.
The overall query could be at least a bit simplified, but I do not do so because it is a working example rather than an actual production query, so it is unclear what other changes would translate to the actual application.
In the example I create two tables. See how outer table have an alias you use in the inner select?
SQL Fiddle Demo
SELECT T.[value], (SELECT [property] FROM Table2 P WHERE P.[value] = T.[value])
FROM Table1 T
This is a better way for performance
SELECT T.[value], P.[property]
FROM Table1 T
INNER JOIN Table2 p
on P.[value] = T.[value];
Table 2 can be a QUERY instead of a real table
Third Option
Using a cte to calculate your values and then join back to the main table. This way you have the subquery logic separated from your final query.
WITH cte AS (
SELECT
T.[value],
T.[value] * T.[value] as property
FROM Table1 T
)
SELECT T.[value], C.[property]
FROM Table1 T
INNER JOIN cte C
on T.[value] = C.[value];
It might be helpful to extract the computation to a function that is called in the SELECT clause and is executed for each row of the result set
Here's the documentation for CREATE FUNCTION for SQL Server. It's probably similar to whatever database system you're using, and if not you can easily Google for it.
Here's an example of creating a function and using it in a query:
CREATE FUNCTION DoComputation(#parameter1 int)
RETURNS int
AS
BEGIN
-- Do some calculations here and return the function result.
-- This example returns the value of #parameter1 squared.
-- You can add additional parameters to the function definition if needed
DECLARE #Result int
SET #Result = #parameter1 * #parameter1
RETURN #Result
END
Here is an example of using the example function above in a query.
SELECT v.value, DoComputation(v.value) as ComputedValue
FROM [Values] v
ORDER BY value

SQL Having Clause

I'm trying to get a stored procedure to work using the following syntax:
select count(sl.Item_Number)
as NumOccurrences
from spv3SalesDocument as sd
left outer join spv3saleslineitem as sl on sd.Sales_Doc_Type = sl.Sales_Doc_Type and
sd.Sales_Doc_Num = sl.Sales_Doc_Num
where
sd.Sales_Doc_Type='ORDER' and
sd.Sales_Doc_Num='OREQP0000170' and
sl.Item_Number = 'MCN-USF'
group by
sl.Item_Number
having count (distinct sl.Item_Number) = 0
In this particular case when the criteria is not met the query returns no records and the 'count' is just blank. I need a 0 returned so that I can apply a condition instead of just nothing.
I'm guessing it is a fairly simple fix but beyond my simple brain capacity.
Any help is greatly appreciated.
Wally
First, having a specific where clause on sl defeats the purpose of the left outer join -- it bascially turns it into an inner join.
It sounds like you are trying to return 0 if there are no matches. I'm a T-SQL programmer, so I don't know if this will be meaningful in other flavors... and I don't know enough about the context for this query, but it sounds like you are trying to use this query for branching in an IF statement... perhaps this will help you on your way, even if it is not quite what you're looking for...
IF NOT EXISTS (SELECT 1 FROM spv3SalesDocument as sd
INNER JOINs pv3saleslineitem as sl on sd.Sales_Doc_Type = sl.Sales_Doc_Type
and sd.Sales_Doc_Num = sl.Sales_Doc_Num
WHERE sd.Sales_Doc_Type='ORDER'
and sd.Sales_Doc_Num='OREQP0000170'
and sl.Item_Number = 'MCN-USF')
BEGIN
-- Do something...
END
I didn't test these but off the top of my head give them a try:
select ISNULL(count(sl.Item_Number), 0) as NumOccurrences
If that one doesn't work, try this one:
select
CASE count(sl.Item_Number)
WHEN NULL THEN 0
WHEN '' THEN 0
ELSE count(sl.Item_Number)
END as NumOccurrences
This combination of group by and having looks pretty suspicious:
group by sl.Item_Number
having count (distinct sl.Item_Number) = 0
I'd expect this having condition to approve only groups were Item_Number is null.
To always return a row, use a union. For example:
select name, count(*) as CustomerCount
from customers
group by
name
having count(*) > 1
union all
select 'No one found!', 0
where not exists
(
select *
from customers
group by
name
having count(*) > 1
)