SQL subquery based on row values with unrelated table - sql

I need to get a count of records in an unrelated table, based on the row values in a query with some moderately complex joins. All data is on one server in a single SQL 2012 database, on several different tables.
I am recreating ticket movement history for a single ticket at a time, from audit records and need to calculate business days for the spans in rows created by the joins. Tickets are moved around between areas (ASSIGNMENT), and there are guidelines on how long it should be at any one area. The ticket may go to the same area multiple times with each time restarting the time count.
I need to consider company holidays in the business day calculations. After looking at several solutions for business day calculations on SE I decided to go with a company calendar table (dbo.UPMCCALENDARM1) and count the dates between spans. Seemed like a great idea...
I can't figure out how to use the row values as parameters for the date count query.
The query below has working solutions with a Variable and with a Cross Join, but it only works with hard coded dates, if I try to use the field values it does not work, because they are not part of the sub query and can not be bound.
-- between DV_im_Audit_ASSIGNMENT.Time and Detail.RESOLVED_TIME
In theory I could probably get there using this full query in the sub query to get the date count, but this is as short as I can make it and still get clean data. It is a pretty heavy lift for an on demand report, that would be my last option. So I want to reach out to UPMCCALENDARM1 as each occurrence of DV_im_Audit_ASSIGNMENT.Time and Detail.RESOLVED_TIME are listed.
Can it be done? If so how?
declare #NonBus integer
set #NonBus = '0'
set #NonBus = (select Count(UPMCCALENDARM1.DATE) as NonBus
from dbo.UPMCCALENDARM1
where UPMC_BUSINESS_DAY = 'f'
and UPMCCALENDARM1.DATE
between '2015-08-01' and '2015-08-31'
-- between DV_im_Audit_ASSIGNMENT.Time and Detail.RESOLVED_TIME
)
select DV_im_Audit_ASSIGNMENT.Incident_ID
, DV_im_Audit_ASSIGNMENT.Old_ASSIGNMENT
, DV_im_Audit_ASSIGNMENT.New_ASSIGNMENT
, DV_im_Audit_ASSIGNMENT.Time as Assign_Time
, B.Time as Reassign_Time
, Detail.OPEN_TIME
, Cal.NonBus
, NonBus
, Detail.RESOLVED_TIME
, A.rownumA
, B.rownumB
from dbo.DV_im_Audit_ASSIGNMENT
--Get RownumA as a select join so I can work with it here, else get an invalid column name 'rownumA' error
left join(select Incident_ID
, Old_ASSIGNMENT
, New_ASSIGNMENT
, [Time]
, rownumA = ROW_NUMBER() OVER (ORDER BY DV_im_Audit_ASSIGNMENT.Incident_ID, DV_im_Audit_ASSIGNMENT.Time)
from dbo.DV_im_Audit_ASSIGNMENT
where Incident_ID = ?
) as A
on DV_im_Audit_ASSIGNMENT.Incident_ID = A.Incident_ID
and DV_im_Audit_ASSIGNMENT.New_ASSIGNMENT = A.New_ASSIGNMENT
and DV_im_Audit_ASSIGNMENT.Time = A.Time
--Get time assigned to next group, is problomatic when assigned to the same group multiple times.
left join(select Incident_ID
, Old_ASSIGNMENT
, New_ASSIGNMENT
, [Time]
, rownumB = ROW_NUMBER() OVER (ORDER BY DV_im_Audit_ASSIGNMENT.Incident_ID, DV_im_Audit_ASSIGNMENT.Time)
from dbo.DV_im_Audit_ASSIGNMENT
where Incident_ID = ?
) as B
on DV_im_Audit_ASSIGNMENT.Incident_ID = B.Incident_ID
and DV_im_Audit_ASSIGNMENT.New_ASSIGNMENT = B.Old_ASSIGNMENT
and DV_im_Audit_ASSIGNMENT.Time < B.Time
and rownumA = (B.rownumB - 1)
--Get current ticket info
left join (select Incident_ID
, OPEN_TIME
, RESOLVED_TIME
from dbo.DV_im_PROBSUMMARYM1_Detail
where Incident_ID = ?
) as Detail
on DV_im_Audit_ASSIGNMENT.Incident_ID = Detail.Incident_ID
--Count non-bussiness days. This section is in testing and does not use dataview as a source.
-- this gets the date count for one group of dates, need a different count for each row based on assign time.
cross join (Select Count(UPMCCALENDARM1.DATE) as NonBus
from dbo.UPMCCALENDARM1
where UPMC_BUSINESS_DAY = 'f'
and UPMCCALENDARM1.DATE
between '2015-08-01' and '2015-08-30'
-- between DV_im_Audit_ASSIGNMENT.Time and Detail.RESOLVED_TIME
) as Cal
--Get data for one ticket
where DV_im_Audit_ASSIGNMENT.Incident_ID = ?
ORDER BY DV_im_Audit_ASSIGNMENT.Incident_ID, DV_im_Audit_ASSIGNMENT.Time
Results
FYI - I am running this SQL through BIRT 4.2, I believe there are few SQL items that will not pass through BIRT

Following the suggestion by #Dominique I created a custom scalar function (using the wizard in SSMS), I used default values for the dates as I had started by playing with stored procedure and that made it easier to test. This problem requires a function as it will return a value per row, where a stored procedure will not.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: James Jenkins
-- Create date: September 2015
-- Description: Counts Business Days for UPMC during a span of dates
-- =============================================
CREATE FUNCTION dbo.UPMCBusinessDayCount
(
-- Add the parameters for the function here
#StartDate date = '2015-08-01',
#EndDate date = '2015-08-31'
)
RETURNS int
AS
BEGIN
-- Declare the return variable here
DECLARE #BusDay int
-- Add the T-SQL statements to compute the return value here
SELECT #BusDay = Count(UPMCCALENDARM1.DATE)
from dbo.UPMCCALENDARM1
where UPMC_BUSINESS_DAY = 't'
and UPMCCALENDARM1.DATE between #StartDate and #EndDate
-- Return the result of the function
RETURN #BusDay
END
GO
After the function is created in the database I added these two lines to my select statement, and it works perfectly.
--Custom function counts business days on UPMCCALENDARM1
, dbo.UPMCBusinessDayCount(DV_im_Audit_ASSIGNMENT.Time, Detail.RESOLVED_TIME) as BusDay
I can use this function for any span that has date data in this (or any query on the database). I will probably be removing the default dates as well as adding a third parameter to count non-business days (UPMC_BUSINESS_DAY = 'f'). But as it is the problem is solved.

Related

SQL Query to pull date range based on dates in a table

I have some SQL code which is contained in an SSRS report and when run pulls a list of student detentions for a set period such as a week or month but I have been asked to get the report to run automatically from the start of the current school term to the date the report has been run. Is this possible? We have 3 terms per year and the dates change each year. The report has multiple subscriptions which will run weekly and filter to students in particular day houses and years so we ideally need the report to update itself.
We have a table in our database titled TblSchoolManagementTermDates which includes txtStartDate and txtFinishDate columns for each term.
The date of the detention is stored in the column detPpl.dDetentionDate
The full SQL code I am currently using is:
SELECT ppl.txtSchoolID AS SchoolID,
detPpl.TblDisciplineManagerDetentionsPupilsID AS DetentionID,
ppl.txtSurname AS Surname,
ppl.txtForename AS Forename,
ppl.txtPrename AS PreferredName,
ppl.intNCYear AS Year,
ppl.txtAcademicHouse AS House,
schTermDates.intSchoolYear AS AcademicYear,
schTerms.txtName AS TermName,
CAST(schTermDates.intSchoolYear AS CHAR(4)) + '/' +
RIGHT(CAST(schTermDates.intSchoolYear + 1 AS CHAR(4)), 2) AS AcademicYearName,
detPpl.dDetentionDate AS DetentionDate,
detSessions.txtSessionName AS DetentionName,
detPpl.txtOffenceDescription AS OffenceDescription,
LEFT(Staff.Firstname, 1) + '. ' + Staff.Surname AS PutInBy,
detPpl.intPresent AS AttendedDetention
FROM dbo.TblPupilManagementPupils AS ppl
INNER JOIN
dbo.TblDisciplineManagerDetentionsPupils AS detPpl
ON detPpl.txtSchoolID = ppl.txtSchoolID
INNER JOIN
dbo.TblDisciplineManagerDetentionsSessions AS detSessions
ON detPpl.intDetentionSessionID = detSessions.TblDisciplineManagerDetentionsSessionsID
INNER JOIN
dbo.TblStaff AS Staff
ON Staff.User_Code = detPpl.txtSubmittedBy
INNER JOIN
dbo.TblSchoolManagementTermDates AS schTermDates
ON detPpl.dDetentionDate BETWEEN schTermDates.txtStartDate AND schTermDates.txtFinishDate
INNER JOIN
dbo.TblSchoolManagementTermNames AS schTerms
ON schTermDates.intTerm = schTerms.TblSchoolManagementTermNamesID
LEFT OUTER JOIN
dbo.TblDisciplineManagerDetentionsCancellations AS Cancelled
ON Cancelled.intSessionID = detPpl.intDetentionSessionID
AND Cancelled.dDetDate = detPpl.dDetentionDate
WHERE (ppl.txtAcademicHouse = 'Challoner') AND (Cancelled.TblDisciplineManagerDetentionsCancellationsID IS NULL) AND (CAST(detPpl.dDetentionDate AS DATE) >= CAST (GETDATE()-28 AS DATE))
ORDER BY ppl.txtSurname, ppl.txtForename, detPpl.dDetentionDate
What you need is to assign a couple of parameters to this code.
lets call the parameters
#term_start
and
#term_end
In your where clause you simply need to remove this piece
AND (CAST(detPpl.dDetentionDate AS DATE) >= CAST (GETDATE()-28 AS DATE))
and add this piece in
AND (CAST(detPpl.dDetentionDate AS DATE) between #term_start and #term_end
Now create another dataset based on your term dates - lets call the dataset term_dates
something like this (I'm making up these fields as I don't know what columns are available or have no sample data) Use the idea below to adapt to your requirements
select
min(term_start_date) as start_date
,max(term_end_date) as end_date
from TblSchoolManagementTermNames
where convert(date,getdate()) between term_start_date and term_end_date
Now your report should have 2 parameters.. You simply need to set the default value for the parameters.
Set the default value for #term_start as the start_date and #term_end as the end_date from your term_dates dataset
Run your report.. You should have the data between the term dates.
This should work.. unless I've misunderstood the requirement

How can I get the first instance of an event per day with multiple columns including a datetime and return those columns plus the full datetime value?

I need to generate a SQL script that will pull out Distinct entries using a number of columns, one of which is a datetime column. I am only interested in the first occurrence of the day per event and the query needs to span multiple days. The query will be run against a very large database and can potentially be returning hundreds of thousands of results if not millions. Therefore I need this script to be as efficient as possible as well. This will eventually be a script running in SSRS to pull access transactions.
I've tried using GROUP BY, DISTINCT, subqueries, FIRST, and such without success. All the examples I can find online don't have JOIN statements or calculated columns such as only gathering the date from a datetime field.
I've simplified the below script some to only pull one day and one door, but the prod will be multiple days and doors. This code returns the data I need, I don't care about the COUNT, but I also need to get the (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) field in my result set as well somehow. The problem is since it goes down to the second it makes all records DISTINCT.
DECLARE #Begin datetime2 = '4/10/2019',
#End datetime2 = '4/11/2019',
#Door varchar(max) = 'Front Entrance'
SELECT
CONVERT(VARCHAR(10), (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)),101) AS 'Date'
,AJ.PrimaryObjectIdentity
,AJ.SecondaryObjectIdentity
,AJ.MessageType
,AJ.PrimaryObjectName
,AJ.SecondaryObjectName
,AP.Text13
,COUNT(*) AS 'Count'
FROM Access.JournalLogView AJ
LEFT OUTER JOIN Access.Personnel as AP on AP.GUID = AJ.PrimaryObjectIdentity
WHERE (MessageType like 'CardAdmitted' OR MessageType like 'CardRejected')
AND (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) BETWEEN #Begin AND #End
AND (SecondaryObjectName IN (#Door))
GROUP BY CONVERT(VARCHAR(10), (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)),101)
,PrimaryObjectIdentity
,SecondaryObjectIdentity
,MessageType
,PrimaryObjectName
,SecondaryObjectName
,Text13
ORDER BY AJ.PrimaryObjectName
I want to get the columns called out in the SELECT statement plus the datetime which includes the second. Again I also want the most efficient way of pulling this data as well. Thank you very much.
Assuming PrimaryObjectIdentity is the primary key to find the personnel in JournalLogview and ServerLocaleOffset as the datetime column in that table,I have written down this:
DECLARE #Begin datetime2 = '4/10/2019',
#End datetime2 = '4/11/2019',
#Door varchar(max) = 'Front Entrance'
WITH cte
AS(
SELECT
ROW_NUMBER() OVER
(PARTITION BY PrimaryObjectIdentity,CONVERT(VARCHAR(10), (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)),101) ORDER BY ServerLocaleOffset) AS row_num,
--whatever the columns you want here
*
FROM
Access.JournalLogView)
SELECT
DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) AS 'DateTime'
,AJ.PrimaryObjectIdentity
,AJ.SecondaryObjectIdentity
,AJ.MessageType
,AJ.PrimaryObjectName
,AJ.SecondaryObjectName
,AP.Text13
--I guess count(*) won't be of use a we are selecting only the first row
,COUNT(*) AS 'Count'
FROM cte AJ
LEFT OUTER JOIN
Access.Personnel as AP
on
AP.GUID = AJ.PrimaryObjectIdentity
WHERE
AJ.row_num = 1
AND (MessageType like 'CardAdmitted' OR MessageType like 'CardRejected')
AND (DateAdd(minute,-(ServerLocaleOffset),ServerUTC)) BETWEEN #Begin AND #End
AND (SecondaryObjectName IN (#Door))
GROUP BY (DateAdd(minute,-(ServerLocaleOffset),ServerUTC))
,PrimaryObjectIdentity
,SecondaryObjectIdentity
,MessageType
,PrimaryObjectName
,SecondaryObjectName
,Text13
ORDER BY AJ.PrimaryObjectName
In this query, I have used PARTITION to partition the whole table by each user, date and then assign row_number() to each row starting from the first entry of each user in that particular date. So, any row with row_num() = 1 will give you the first entry of that user in that date (which is the same condition I have used in the where clause). Hope this helps :)

SQL Server Arithmetic Overflow Error - converting nvarchar to datetime

I'm retrieving data from a view, and one of the columns I'm using is nvarchar(50), but is only ever N'True' or N'False', depending on the operation of a related date column in this parent view.
The following code retrieves the record ID and the column I'm looking for, YTD:
SELECT Enquiry_Number, YTD
FROM dbo.vw_SalesPO_YTD
Output:
ENQ-001 True
ENQ-002 False
ENQ-003 True
However, I'm unable to filter my results using this YTD column for some reason. If I attempt to do this:
SELECT Enquiry_Number, YTD
FROM dbo.vw_SalesPO_YTD
WHERE YTD = N'True'
Then it fails with the following error:
Arithmetic overflow error converting expression to data type datetime.
Which I don't understand because there are no datetime expressions in play in this query. Yes the True or False was determined by comparing datetimes in the parent view, but I don't understand how that might have trickled down to this subquery. Attempting the same thing in the parent view yields the same error - I'm demonstrating it this way for simplicity's sake.
However, performing a similar operation in the SELECT portion of the query works without issues:
SELECT
Enquiry_Number,
YTD,
CASE
WHEN YTD = N'True' THEN 1 ELSE 0
END As C
FROM
dbo.vw_SalesPO_YTD
Output:
ENQ-001 True 1
ENQ-002 False 0
ENQ-003 True 1
However, these 1's and 0's inherit the same flaw, where I can't use them in a WHERE clause without getting this datetime error.
I've been searching hard and am not sure how to identify the core issue. I've been reading things about Collations and type precedence, but can't understand why this behaviour is happening.
When I've checked YTD in INFORMATION_SCHEMA.COLUMNS, it confirms that this column is no different from other columns in my table: YTD is nvarchar(50), using the Latin1_General_CI_AS collation.
Related question: SQL Server Arithmetic overflow error converting expression to data type datetime
The Source of the Problem
This issue is still unsolved, but if you wish to reproduce it, this code from the parent view must be generating this issue:
CASE WHEN
Award_Date <= DATEFROMPARTS(FinancialYear - 1, 11, 1) + GETDATE() - DATEADD(year, DATEDIFF(month, '20161101', GETDATE()) / 12, '20161101')
THEN N'True'
ELSE N'False'
END
Yes this looks overly complicated. We're checking the Award_Date against its associated FinancialYear, which runs from November 1st to October 31st. Each record already knows which FinancialYear it's in. The ultimate aim is to compare TODAY's position (2016-11-30) against TODAY last year (2015-11-30), and TODAY the year before (2014-11-30), etc.
So the code takes today's date and combines it with the FinancialYear for the associated record, and spits out whether the record had occurred between the start of its financial year and the today of the same year. And it's doing this successfully, but then I can't do anything with the N'True' or N'False' it's producing.
I do not know what the type of the source YTD is.
Try using the following:
SELECT Enquiry_Number, [YTD] FROM (
SELECT Enquiry_Number, CONVERT(nvarchar(10),YTD) AS [YTD] FROM dbo.vw_SalesPO_YTD
) AS A
WHERE A.YTD = N'True'
10 is just a thump suck value. It will cut of any part of the field longer that 10. It depends on your actual field size.
SELECT *
FROM (SELECT Enquiry_Number, YTD
FROM dbo.vw_SalesPO_YTD) AS A
WHERE cast(A.YTD as varchar) = 'True'
I used the following as an example:
DECLARE #Data TABLE
(
Enquiry_Number nvarchar(10),
YTD nvarchar(50)
)
INSERT INTO #Data(Enquiry_Number, YTD)
SELECT N'ENQ-001', N'True' UNION
SELECT N'ENQ-002', N'False' UNION
SELECT N'ENQ-003', N'True'
SELECT Enquiry_Number, [YTD] FROM (
SELECT Enquiry_Number, CONVERT(nvarchar(10),YTD) AS [YTD] FROM #Data
) AS A
WHERE A.YTD = N'True'
Result:
ENQ-001 True
ENQ-003 True
There must be results in the YTD field that causes to return it as a datetime type.
Try a query like:
SELECT * FROM dbo.vw_SalesPO_YTD WHERE ISDATE(YTD)= 1
Updated Question:
Try:
ISNULL(CASE WHEN Award_Date <= DATEFROMPARTS(FinancialYear - 1, 11, 1) + GETDATE() - DATEADD(year, DATEDIFF(month, '20161101', GETDATE()) / 12, '20161101') THEN N'True' ELSE N'False' END, 'false')
Try this:- Don't add n before string
SELECT *
FROM (SELECT Enquiry_Number, YTD
FROM dbo.vw_SalesPO_YTD) AS A
WHERE A.YTD = 'True'
This is a successful workaround, not a solution per se or explanation of the core issue.
There's obviously an issue with the data coming out of the view, which will not allow YTD column's results to be operated on in a WHERE clause, and yet they can be operated on by the time the query reaches its SELECT phase.
I've created a new table which explicitly defines the YTD column as nvarchar(50), and then inserted all the records from my view into this table, which has resolved the issue. The records can then be sorted and filtered by YTD as they are supposed to.

Improving performance of outer apply

Let me briefly describe what I'm attempting in case someone has a much more elegant way of solving the same problem. I'm trying to write a stored procedure that looks at sales orders in a database, find when the same item is ordered by the same customer multiple times, and predict the next date of an order using an average of the previous intervals between orders for the same item. The query below is going to form the basis for the temp table to work against with probably cursors and running averages.
So far the query I have looks like this
SELECT sl.custaccount ,
sl.itemid ,
sl.shippingdaterequested ,
nextdate.shippingdaterequested AS nextshippingdaterequested
FROM salesline AS sl
OUTER APPLY ( SELECT TOP 1
sl2.custaccount ,
sl2.itemid ,
sl2.shippingdaterequested
FROM salesline AS sl2
WHERE sl2.shippingdaterequested > sl.shippingdaterequested
AND sl2.custaccount = sl.custaccount
AND sl2.itemid = sl.itemid
GROUP BY sl2.custaccount ,
sl2.itemid ,
sl2.shippingdaterequested
ORDER BY sl2.shippingdaterequested
) AS nextdate
GROUP BY sl.custaccount ,
sl.itemid ,
sl.shippingdaterequested ,
nextdate.shippingdaterequested
This query gives me a row for every sales line with a column representing the next time that item was ordered by that customer. If that column is NULL, I know the record I'm on is the last time.
The basic problem is that this query is way too slow, it runs fine if I go against a single customer at a time, returning results in a second, but running against ~100,000 customers would take around 27 hours.
I know the basic problem is that I'm outer applying, so it's probably doing row by agonizing row processing, but I'm not sure of another way to get to hear that would work out faster. Any thoughts?
I think you are making it more complex than it needs to be.
Just take the min and max and divide by the count
SELECT sl.custaccount ,
sl.itemid ,
MAX(sl.shippingdaterequested) AS lastShip ,
DATEDIFF(dd, MIN(sl.shippingdaterequested),
MAX(sl.shippingdaterequested)) / COUNT(*) AS interval ,
DATEADD(dd,
DATEDIFF(dd, MIN(sl.shippingdaterequested),
MAX(sl.shippingdaterequested)) / COUNT(*),
MAX(sl.shippingdaterequested)) AS nextShip
FROM salesline AS sl
GROUP BY sl.custaccount ,
sl.itemid
HAVING COUNT(*) > 1

Calculate Average after populating a temp table

I have been tasked with figuring out the average length of time that our customers stick with us. (Specifically from the date they become a customer, to when they placed their last order.)
I am not 100% sure that I am doing this properly, but my thought was to gather the date we enter the customer into the database, and then head over to the order table and grab their most recent order date, dump them into a temp table, and then figure out the length of time between those two dates, and then tally an average based on that number.
( I have to do some other wibbly wobbly time stuff as well, but this is the one thats kicking my butt)
The end goal with this is to be able to say "On Average our customers stick with us for 4 years, and 3 months." (Or whatever the data shows it to be.)
SELECT * INTO #AvgTable
FROM(
SELECT DISTINCT (c.CustNumber) AS [CustomerNumber]
, COALESCE(convert( VARCHAR(10),c.OrgEnrollDate,101),'') AS [StartDate]
, COALESCE(CONVERT(VARCHAR(10),MAX(co.OrderDate),101),'')AS [EndDate]
,DATEDIFF(DD,c.OrgEnrollDate, co.OrderDate) as [LengthOfTime]
FROM dbo.Customer c
JOIN dbo.CustomerOrder co ON c.ID = co.CustomerID
WHERE c.Archived = 0
AND co.Archived =0
AND c.OrgEnrollDate IS NOT NULL
AND co.OrderDate IS NOT NULL
GROUP BY c.CustNumber
, co.OrderDate 2
)
--This is where I start falling apart
Select AVG[LengthofTime]
From #AvgTable
If understand you correctly, then just try
SELECT AVG(DATEDIFF(dd, StartDate, EndDate)) AvgTime
FROM #AvgTable
My guess is that since you are storing the data in a temp table, that the integer result of the datediff is being implicitly converted back to a datetime (which you cannot do an average on).
Don't store the average in your temp table (don't even have a temp table, but that is whole different conversation). Just do the differencing in your select.