sql while loop duplicating results - sql

I have a rather large and complex query to work out people in work, off sick etc. This worked great if I want to just see it for 1 day, however I need to allow users to view multiple days.
I added a startdate and enddate parameter and looked at building in a sql while loop to change the start date each time and write the values into a temp table so I can pull them out at the end. This may not be the best approach.
I have got the loop working, however it keeps duplicating the results like in the example below:
How the data should look:
Date: Value
01/01/2014 1
02/01/2014 2
03/01/2014 3
How data is being exported:
Date: Value
01/01/2014 1
02/01/2014 1
02/01/2014 2
03/01/2014 1
03/01/2014 2
03/01/2014 3
This is the example of the loop I found and I have used with my own sql code in the middle. My sql code only uses the startdate parameter being passed in.
Should I be using a different type of loop maybe, or have I missed something out to stop the duplication? Any suggestions welcome as im not sure how to stop the loop doing this. It is bring back the correct data I just need to exclude the duplicates.
Structure of my code and loop but not the full example as code in middle is very long:
CREATE TABLE #TestTable1
(
Date DATETIME
Value int
);
declare #startdate datetime
declare #enddate datetime
while #startdate <= #enddate
BEGIN
(My Sql Code is placed here and uses the #startdate parameter)
INSERT INTO #TestTable1(Date, value)
select * from (uses allot of temp tables and cte from the code i have used.)
SET #startdate = DATEADD(DAY, 1, #startdate)
END
select * from #TestTable1
drop table #TestTable1

Related

Adding x work days onto a date in SQL Server?

I'm a bit confused if there is a simple way to do this.
I have a field called receipt_date in my data table and I wish to add 10 working days to this (with bank holidays).
I'm not sure if there is any sort of query I could use to join onto this table from my original to calculate 10 working days from this, I've tried a few sub queries but I couldn't get it right or perhaps its not possible to do this. I didn't know if there was a way to extract the 10th rowcount after the receipt date to get the calendar date if I only include 'Y' into the WHERE?
Any help appreciated.
This is making several assumptions about your data, because we have none. One method, however, would be to create a function, I use a inline table value function here, to return the relevant row from your calendar table. Note that this assumes that the number of days must always be positive, and that if you provide a date that isn't a working day that day 0 would be the next working day. I.e. adding zero working days to 2021-09-05 would return 2021-09-06, or adding 3 would return 2021-09-09. If that isn't what you want, this should be more than enough for you to get there yourself.
CREATE FUNCTION dbo.AddWorkingDays (#Days int, #Date date)
RETURNS TABLE AS
RETURN
WITH Dates AS(
SELECT CalendarDate,
WorkingDay
FROM dbo.CalendarTable
WHERE CalendarDate >= #Date)
SELECT CalendarDate
FROM Dates
WHERE WorkingDay = 1
ORDER BY CalendarDate
OFFSET #Days ROWS FETCH NEXT 1 ROW ONLY;
GO
--Using the function
SELECT YT.DateColumn,
AWD.CalendarDate AS AddedWorkingDays
FROM dbo.YourTable YT
CROSS APPLY dbo.AddWorkingDays(10,YT.DateColumn) AWD;

Big Query Loop through Start and End date

I have a table with Start and end date with the Interval of 6 months. Below is one example:
Row start_date end_date
1 2018-09-18 2019-03-18
2 2019-03-18 2019-09-18
3 2019-09-18 2020-03-18
I have a master table (which is very big), So I have loop through this start_date and end_date and insert the record selected into the different table. Below is the sample query.
create table dataset.t1 (v1,v2,v3,create_dt);
LOOP
insert into dataset.t1 (v1,v2,v3,create_dt) select v1,v2,v3,create_dt
from dataset.t2 where create_dt >= (select start_date from dataset.t1)
and create_date < (select end_date from dataset.t1)
END LOOP.
When I tried with Loop I am getting below error:
Query error: Scalar subquery produced more than one element at.
Could anyone please help me on how to implement this. My final goal is to improve performance by dividing the date into different ranges.
On the error that you've got, the problem is your (select start_date from dataset.t1) returns more than one element, not sure what you want to achieve but in order for the subquery to work, it should be something like (select MIN(start_date) from dataset.t1).
I don't understand your loop because nothing seems changed in your loop (beside you're inserting something to t1), you should think about when your loop should exit.
The below works in SQl server. You will have to find Big data equivalent for cursor. You can use an array to hole the start & end dates and loop through.
You can create a temporary table for storing the Start date & Enda date and called it #temp_db. Open a cursor and fetch the first start date & end date into a variable.from #temp_db.
select start_date,end_date into #start_date,#end_date from #temp_db
Execute the Sql:
Insert into #new_tbl
select * from #src_tbl where create_dt >= #start_date and create_dt < #end_date.
For every record, fetched from the #temp_db, start inserting into a new table/same table as per your request.
When there is no more row to fetch from your#temp_db..you would have inserted all the records into the #new_tbl.

Calculating working time with overlapping events (SQL)

I have found similar queries on StackOverflow (e.g. Finding simultaneous events in a database between times) but nothing that matches exactly what I am after as far as I can tell so thought it OK to add as a new question.
I have a table that logs jobs (or "Activities"), with a start/end time for the job. I need to calculate working time (you can disregard non-working days, break times etc. as I have that covered). The complication is an individual can work on simultaneous jobs, overlapping at different points (the assumption is equal effort on simultaneous jobs), and the working time needs to reflect that. Minute accuracy is all that is required, not to the second.
Based on other suggestions I have this query, implemented as a table-valued function. It will look at each minute that activity is running, and if any other activities are running in the same period for the same person, and make calculations based on that. It works, but is very inefficient - taking over a minute to execute. Any ideas how I can do this more efficiently?
Running SQL 2005. I have done the obvious such as to add indexes on foreign keys by the way.
CREATE FUNCTION [dbo].[WorkActivity_WorkTimeCalculations] (#StartDate smalldatetime, #EndDate smalldatetime)
RETURNS #retActivity TABLE
(
ActivityID bigint PRIMARY KEY NOT NULL,
WorkMins decimal NOT NULL
)
/********************************************************************
Summary: Calculates the WORKING time on each activity running in a given date/time range
Remarks: Takes into account staff working simultaneously on jobs
(evenly distributes working time across simultaneous jobs)
Input Params: #StartDate - the start of the period to calculate
#EndDate - the end of the period to calculate
Output Params:
Returns: Recordset of activities and associated working time (minutes)
********************************************************************/
AS
BEGIN
-- any work activities still running use the overall end date as the activity's end date for the purpose of calculating
-- simulateneous jobs running
-- POPULATE A TEMP TABLE WITH EVERY MINUTE IN THE DATE RANGE
DECLARE #Minutes TABLE (MinuteDateTime smalldatetime NOT NULL)
;WITH cte AS (
SELECT #StartDate AS myDate
UNION ALL
SELECT DATEADD(minute,1,myDate)
FROM cte
WHERE DATEADD(minute,1,myDate) <= #EndDate
)
INSERT INTO #Minutes (MinuteDateTime)
SELECT myDate FROM cte
OPTION (MAXRECURSION 0)
-- POPULATE A TEMP TABLE WITH WORKLOAD PER EMPLOYEE PER MINUTE
DECLARE #JobsRunningByStaff TABLE (StaffID smallint NOT NULL, MinuteDateTime smalldatetime NOT NULL, JobsRunning decimal NOT NULL)
INSERT INTO #JobsRunningByStaff (StaffID, MinuteDateTime, JobsRunning)
SELECT wka_StaffID, MinuteDateTime, COUNT(DISTINCT wka_ItemID) JobsRunning
FROM dbo.WorkActivities
INNER JOIN #Minutes ON (MinuteDateTime BETWEEN wka_StartTime AND DATEADD(minute,-1,ISNULL(wka_EndTime,#EndDate)))
GROUP BY wka_StaffID, MinuteDateTime
-- FINALLY MAKE THE CALCULATIONS FOR EACH ACTIVITY
INSERT INTO #retActivity
SELECT wka_ActivityID, SUM(1/JobsRunning)WorkMins
FROM dbo.WorkActivities
INNER JOIN #JobsRunningByStaff ON (wka_StaffID = StaffID AND MinuteDateTime BETWEEN wka_StartTime AND DATEADD(minute,-1,ISNULL(wka_EndTime,#EndDate)))
GROUP BY wka_ActivityID
RETURN
END
Some example data (sorry for the poor formatting!)...
Source Data from WorkActivities table:
ACTIVITY ID | START TIME | END TIME | STAFF ID
1 | 03/03/2016 10:30 | 03/03/2016 10:50 | 1
2 | 03/03/2016 10:40 | 03/03/2016 11:00 | 1
And the desired results for a function call of SELECT * FROM dbo.WorkActivity_WorkTimeCalculations ('03-Mar-2016 10:30','03-Mar-2016 11:30'):
ACTIVITY ID | WORKMINS
1 | 25
2 | 15
So, the results take into account between 10:40 and 10:50 there are two jobs happening simultaneously, so calculates 5 mins working time on each over that period.
As suggested by posters, indexing made a significant difference - creating an index with wka_StartTime and wka_EndTime sorted it.
(sorry, couldn't see how to mark the comments made by others as an answer!)

3rd <day_of_week> of the Month - MySQL

I'm working on a recurrence application for events. I have a date range of say, January 1 2010 to December 31 2011. I want to return all of the 3rd Thursdays (arbitrary) of the each month, efficiently. I could do this pretty trivially in code, the caveat is that it must be done in a stored procedure. Ultimately I'd want something like:
CALL return_dates(event_id);
That event_id has a start_date of 1/1/2010 and end_date of 12/31/2011. Result set would be something like:
1/20/2010
2/14/2010
3/17/2010
4/16/2010
5/18/2010
etc.
I'm just curious what the most efficient method of doing this would be, considering I might end up with a very large result set in my actual usage.
One idea that comes to mind - you can create a table and store the dates you're interested in there.
Ok, I haven't tested it, but I think the most efficient way of doing it is via a tally table which is a useful thing to have in the db anyway:
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[num_seq]') AND type in (N'U'))
DROP TABLE [dbo].[num_seq];
SELECT TOP 100000 IDENTITY(int,1,1) AS n
INTO num_seq
FROM MASTER..spt_values a, MASTER..spt_values b;
CREATE UNIQUE CLUSTERED INDEX idx_1 ON num_seq(n);
You can then use this to build up the date range between the two dates. It's fast because
it just uses the index (in fact often faster than a loop, so I'm led to believe)
create procedure getDates
#eventId int
AS
begin
declare #startdate datetime
declare #enddate datetime
--- get the start and end date, plus the start of the month with the start date in
select #startdate=startdate,
#enddate=enddate
from events where eventId=#eventId
select
#startdate+n AS date,
from
dbo.num_seq tally
where
tally.n<datediff(#monthstart, #enddate) and
Datepart(dd,#startdate+n) between 15 and 21 and
Datepart(dw, #startdate+n) = '<day>'
Aside from getting the start and end dates, the third x id each month must be between the 15th and the 21st inclusive.
The day names in that range must be unique, so we can locate it straight away.
If you wanted the second dayname, just modify the range appropriately or use a parameter to calculate it.
It constucts a date table using the startdate, and then adding days on (via the list of numbers in the tally table) until it reaches the end date.
Hope it helps!

To get a range of values

My table called TimeList with 2 columns SlotID(int identity) and SlotTime(varchar) in database is like this.
SlotID SlotTime
1 8:00AM-8:15AM
2 8:15AM-8:30AM
3 8:30AM-8:45AM
4 8:45AM-9AM
5 9AM-9:30AM
likewise up to 6:45PM-7:00PM.
if i pass 2 parameters starttime and endtime as 8:00AM and endtime as 9AM,I want to retrieve first 4 rows in the above given table.Can anybody help to have such a stored procedure.
Would it be possible to refactor the table to look like this:
SlotID SlotStart SlotEnd
----------------------------
1 8:00am 8:15am
2 8:15am 8:30am
...
If you split the times into separate columns, it will be easier to query the date ranges. The query would look something like this:
#StartTime = '8:00am'
#EndTime = '9:00am'
select SlotID, SlotStart, SlotEnd
from Slots
where SlotStart >= #StartTime
and SlotEnd <= #EndTime
Your data is not properly normalized, so it will be hard to query. A field should only contain a single value, so you should have the starting and ending time for the slot in separate fields:
SlotID StartTime EndTime
1 8:00AM 8:15AM
2 8:15AM 8:30AM
3 8:30AM 8:45AM
4 8:45AM 9:00AM
5 9:00AM 9:30AM
This also allows you to use a datetime type for the fields instead of a textual data type, so that you can easily query the table:
select SlotId, StartTime, EndTime
from TimeList
where StartTime >= '8:00AM' and EndTime <= '9:00AM'
With your original table design, you would have to use string operations to split the values in the field, and convert the values to make it comparable. If you get a lot of data in the table, this will be a killer for performance, as the query can't make use of indexes.
The problem is that your table is not normalized. Please read up on that at http://en.wikipedia.org/wiki/Database_normalization , it can greatly improve the quality of the systems you design.
In your current case, please follow Andy's advice and separate SlotStart and SlotEnd. Your time format is not good either. Use a DateTime format (or whatever your database offers you as its time type) or a numerical type like INT to store your values (e.g. 1800 instead of 6:00PM).
Then you can easily use
SELECT FROM TimeList WHERE SlotStart>=... AND SlotEnd<=...
and select whatever you like from your table.