SQL Server : combine SELECT and related UDF results

SQL Server : combine SELECT and related UDF results - sql

I failed to google this scenario: I would like to insert new rows into (time slot) table, for some records (selected by WHERE clause) and add 3 columns as a result of User-Defined-Function (which calculates a free slot date, start and end time). This has to work, even if the UDF returns more than one row.
Based on Microsoft's suggestion about using UDF:
SELECT ContactID, FirstName, LastName, JobTitle, ContactType
FROM dbo.ufnGetContactInformation(1209);
I came up with this concept:
INSERT INTO PlanTimeSlots (........................)
SELECT
PRJ.ID as RID,
GST.SlotDate as SlotDate,
GST.SlotStart as TimeStart,
GST.SlotEnd as TimeEnd,
PRJ.WPGroupID as WPGroupID,
45 as Priority
FROM
PlanRJ as PRJ
LEFT JOIN
(SELECT
SlotDate, SlotStart, SlotEnd
FROM
dbo.GetSuitableTimeSlot(PRJ.ID, PRJ.WPGroupID,
PRJ.DateReqBy, PRJ.DurationMin)) AS GST ON GST.JID = PRJ.ID
WHERE
........;
So I redundantly pass an RID to the UDF, which is returned as GST.JID, so there's a key to join UDFs result set to the main select.
Is this OK, or is there a better solution? It will work with hundreds to thousands entries and I'm not sure if this concept could perform well.

1. query result will depend upon your where condition also.
2. If you want to get all the records from left query then use left join otherwise change it to INNER join
3. Treat your user defined function as other table no need for select statement.
INSERT INTO PlanTimeSlots (........................)
SELECT
PRJ.ID as RID,
GST.SlotDate as SlotDate,
GST.SlotStart as TimeStart,
GST.SlotEnd as TimeEnd,
PRJ.WPGroupID as WPGroupID,
45 as Priority
FROM
PlanRJ as PRJ
INNER JOIN
dbo.GetSuitableTimeSlot(PRJ.ID, PRJ.WPGroupID,
PRJ.DateReqBy, PRJ.DurationMin) AS GST ON GST.JID = PRJ.ID
WHERE
........;

Related

How do I display only one result (the highest) with SQL query? (Beginner)

I need help making the following query only display one result, the one with the MAX Procurement Rate.
Currently the query works, but displays all results not just the one with the output of the MAX function
SELECT SalesPeople.SalesPersonID, FirstName, LastName, Region, SalesRevenueYear1, ProcurementCost
FROM ProductRevenueAndCosts
INNER JOIN SalesPeople
ON ProductRevenueAndCosts.SalesPersonID = SalesPeople.SalesPersonID
WHERE SalesPeople.Region = 'Central' AND (
SELECT MAX (ProcurementCost)
FROM ProductRevenueAndCosts
WHERE SalesPeople.Region = 'Central'
)

If you add a LIMIT 1 clause at the end of your SQL, then only the first record will be shown. If you add an ORDER BY column_name, then the results will be ordered by that column. Using these two together is a quick way to get the max or min without having to worry about aggregate functions.
https://www.w3schools.com/mysql/mysql_limit.asp
Otherwise, you can try aggregating the results with a max function:
https://www.w3schools.com/mysql/mysql_min_max.asp

As mentioned, you need to correlate the subquery to outer query. Be sure to use aliases between same named columns and exercise good practice in qualifying all columns with table names or aliases especially in JOIN queries:
SELECT sp.SalesPersonID, sp.FirstName, sp.LastName, sp.Region, sp.SalesRevenueYear1,
prc.ProcurementCost
FROM ProductRevenueAndCosts prc
INNER JOIN SalesPeople sp
ON prc.SalesPersonID = prc.SalesPersonID
WHERE sp.Region = 'Central'
AND prc.ProcurementCost = ( -- CORRELATE OUTER QUERY WITH SUBQUERY
SELECT MAX(ProcurementCost)
FROM ProductRevenueAndCosts
)
Note: If running in MS Access, remove the comment

SQL - Difference between FROM(subquery) and WHERE - IN(subquery)

I would like to ask to diference between following two SQL statements.
The first one is working correctly, but the second one not. When I "create a new table" from subquery then result is correct, but if I use the same subquery in WHERE-IN statement then I get a different result.
SELECT `T`.`city`, COUNT(*)
FROM (
SELECT `address`.`city`
FROM `address`
INNER JOIN `person` ON `person`.`address_id`=`address`.`address_id`
INNER JOIN `person_detail` ON `person_detail`.`person_detail_id`=`person`.`person_detail_id`
WHERE (`person_detail`.`phone` LIKE '%+42056%') OR (`person_detail`.`phone` LIKE '%+42057%')
) AS T
GROUP BY `T`.`city`
ORDER BY `COUNT(*)` ASC
///////////////////////////////////
SELECT `address`.`city`, COUNT(*)
FROM `address`
WHERE `address`.`city` IN (
SELECT `address`.`city`
FROM `address`
INNER JOIN `person` ON `person`.`address_id`=`address`.`address_id`
INNER JOIN `person_detail` ON `person_detail`.`person_detail_id`=`person`.`person_detail_id`
WHERE (`person_detail`.`phone` LIKE '%+42056%') OR (`person_detail`.`phone` LIKE '%+42057%')
)
GROUP BY `address`.`city`
ORDER BY `COUNT(*)`;

The first query will run the subquery first which returns a distinct list of 'city'. You then do another group by on it with a count which should lead to a result set of 'city' with all ones next to it. In essence you are running your query off of the subquery (not the address table itself).
Your second query will run the subquery first, return the distinct list of 'city' then using that list, go back to the original table and return everything that matches (which should be the entire table of address) and then group by it and return a count. This will lead to a different result since you are hitting the original table vs hitting the subquery result.

Create table that is table 1 minus table 2 based on three criteria

I have a table of LoggedDischarges and another table of ActualDischarges.
I am trying to generate a query that will give me all the fields from ActualDischarges excluding those already in LoggedDischarges based on AgencyID, Program and ActivityEndDate
A client can be in multiple programs and be discharged from multiple on the same day. I need to make sure I get LoggedDischarges from each program.
This is what I have but am not sure how to add the other criteria.
select * from ActualDischarges
where (agencychildid ) not in
(select agencyid from LoggedDischarges)
Thank you,
Steve Hathaway

Even if your DBMS supports multiple columns in a subquery like
where (AgencyID, Program, ActivityEndDate) not in
( select AgencyID, Program, ActivityEndDate
from ... )
you better switch to a NOT EXISTS (in case of any NULLs):
select * from ActualDischarges as aD
where NOT EXISTS
(select * from LoggedDischarges as lD
where aD.AgencyID = lD.AgencyID
and aD.Program = lD. Program
and aD.ActivityEndDate= lD.ActivityEndDate)

For this type of match, I would recommend a LEFT JOIN with an IS NULL at the end to determine that the second table does not have the record:
SELECT a.*
FROM ActualDischarges AS a
LEFT JOIN LoggedDischarges AS l
ON agencyid=agencychildid
AND a.program=l.program
AND a.ActivityEndDate=l.ActivityEndDate
WHERE l.agencyid IS NULL
As a side note, definitely avoid using multiple IN statements for situations like this WHERE NOT IN (...) AND NOT IN (...) etc. as you end up excluding records which match different records in LoggedDischarges for different reasons, which is rarely the desired result.

Alternative for joining two tables multiple times

I have a situation where I have to join a table multiple times. Most of them need to be left joins, since some of the values are not available. How to overcome the query poor performance when joining multiple times?
The Scenario
Tables
[Project]: ProjectId Guid, Name VARCHAR(MAX).
[UDF]: EntityId Guid, EntityType Char(1), UDFCode Guid, UDFName varchar(20)
[UDFDetail]: UDFCode Guid, Description VARCHAR(MAX)
Relationship:
[Project].ProjectId - [UDF].EntityId
[UDFDetail].UDFCode - [UDF].UDFCode
The UDF table holds custom fields for projects, based on the UDFName column. The value for these fields, however, is stored on the UDFDetail, in the column Description.
I have lots of custom columns for Project, and they are stored in the UDF table.
So for example, to get two fields for the project I do the following select:
SELECT
p.Name ProjectName,
ud1.Description Field1,
ud1.UDFCode Field1Id,
ud2.Description Field2,
ud2.UDFCode Field2Id
FROM
Project p
LEFT JOIN UDF u1 ON
u1.EntityId = p.ProjectId AND u1.ItemName='Field1'
LEFT JOIN UDFDetail ud1 ON
ud1.UDFCode = u1.UDFCode
LEFT JOIN UDF u2 ON
u2.EntityId = p.ProjectId AND u2.ItemName='Field2'
LEFT JOIN UDFDetail ud2 ON
ud2.UDFCode = u2.UDFCode
The Problem
Imagine the above select but joining with like 15 fields. In my query I have around 10 fields already and the performance is not very good. It is taking about 20 seconds to run. I have good indexes for these tables, so looking at the execution plan, it is doing only index seeks without any lookups. Regarding the joins, it needs to be left join, because Field 1 might not exist for that specific project.
The Question
Is there a more performatic way to retrieve the data?
How would you do the query to retrieve 10 different fields for one project in a schema like this?

Your choices are pivot, explicit aggregation (with conditional functions), or the joins. If you have the appropriate indexes set up, the joins may be the fastest method.
The correct index would be UDF(EntityId, ItemName, UdfCode).
You can test if the group by is faster by running a query such as:
SELECT count(*)
FROM p LEFT JOIN
UDF u1
ON u1.EntityId = p.ProjectId LEFT JOIN
UDFDetail ud1
ON ud1.UDFCode = u1.UDFCode;
If this runs fast enough, then you can consider the group by approach.

You can try this very weird contraption (it does not look pretty, but it does a single set of outer joins). The intermediate result is a very "wide" and "long" dataset, which we can then "compact" with aggregation (for example, for each ProjectName, each Field1 column will have N result, N-1 NULLs and 1 non-null result, which is then selecting with a simple MAX aggregation) [N is the number of fields].
select ProjectName, max(Field1) as Field1, max(Field1Id) as Field1Id, max(Field2) as Field2, max(Field2Id) as Field2Id
from (
select
p.Name as ProjectName,
case when u.UDFName='Field1' then ud.Description else NULL end as Field1,
case when u.UDFName='Field1' then ud.UDFCode else NULL end as Field1Id,
case when u.UDFName='Field2' then ud.Description else NULL end as Field2,
case when u.UDFName='Field2' then ud.UDFCode else NULL end as Field2Id
from Project p
left join UDF u on p.ProjectId=u.EntityId
left join UDFDetail ud on u.UDFCode=ud.UDFCode
) tmp
group by ProjectName
The query can actually be rewritten without the inner query, but that should not make a big difference :), and looking at Gordon Linoff's suggestion and your answer, it might actually take just about 20 seconds as well, but it is still worth giving a try.

How do I write an SQL query to identify duplicate values in a specific field?

This is the table I'm working with:
I would like to identify only the ReviewIDs that have duplicate deduction IDs for different parameters.
For example, in the image above, ReviewID 114 has two different parameter IDs, but both records have the same deduction ID.
For my purposes, this record (ReviewID 114) has an error. There should not be two or more unique parameter IDs that have the same deduction ID for a single ReviewID.
I would like write a query to identify these types of records, but my SQL skills aren't there yet. Help?
Thanks!
Update 1: I'm using TSQL (SQL Server 2008) if that helps
Update 2: The output that I'm looking for would be the same as the image above, minus any records that do not match the criteria I've described.
Cheers!

SELECT * FROM table t1 INNER JOIN (
SELECT review_id, deduction_id FROM table
GROUP BY review_id, deduction_id
HAVING COUNT(parameter_id) > 1
) t2 ON t1.review_id = t2.review_id AND t1.deduction_id = t2.deduction_id;
http://www.sqlfiddle.com/#!3/d858f/3
If it is possible to have exact duplicates and that is ok, you can modify the HAVING clause to COUNT(DISTINCT parameter_id).

Select ReviewID, deduction_ID from Table
Group By ReviewID, deduction_ID
Having count(ReviewID) > 1
http://www.sqlfiddle.com/#!3/6e113/3 has an example

If I understand the criteria: For each combination of ReviewID and deduction_id you can have only one parameter_id and you want a query that produces a result without the ReviewIDs that break those rules (rather than identifying those rows that do). This will do that:
;WITH review_errors AS (
SELECT ReviewID
FROM test
GROUP BY ReviewID,deduction_ID
HAVING COUNT(DISTINCT parameter_id) > 1
)
SELECT t.*
FROM test t
LEFT JOIN review_errors r
ON t.ReviewID = r.ReviewID
WHERE r.ReviewID IS NULL
To explain: review_errors is a common table expression (think of it as a named sub-query that doesn't clutter up the main query). It selects the ReviewIDs that break the criteria. When you left join on it, it selects all rows from the left table regardless of whether they match the right table and only the rows from the right table that match the left table. Rows that do not match will have nulls in the columns for the right-hand table. By specifying WHERE r.ReviewID IS NULL you eliminate the rows from the left hand table that match the right hand table.
SQL Fiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server : combine SELECT and related UDF results - sql

Related

How do I display only one result (the highest) with SQL query? (Beginner)

SQL - Difference between FROM(subquery) and WHERE - IN(subquery)

Create table that is table 1 minus table 2 based on three criteria

Alternative for joining two tables multiple times

How do I write an SQL query to identify duplicate values in a specific field?

Categories

Resources