SQL working days query - sql

I am trying to write a query to calculate the number of working days between 2 dates. I first tried this in VBA, which worked, but this is not very efficient.
I have 2 queries, the first works out the date difference between a valueDate and cutOffDate; the second counts the number of dates from a table of holidays/weekends that fall between the valueDate and cutOffDay.
The trouble I'm having is how to combine these 2 parts to give the item age (number of working days between the dates).
My query examples are:
SELECT allOS.SortCode, allOS.NPA, allOS.valueDate, allOS.cutOffDate,
DateDiff("d",[allOS.valueDate],[allOS.cutOffDate]) AS Age
FROM allOS;
and
SELECT Count(Holidays.Holiday) AS NonWorkingDays
FROM Holidays
HAVING (([Holiday]>[#01/01/2013#] And [Holiday]<[#11/06/2013#]));
I need to subtract the result of the second query from the Age of the first query.
Sample input and output data
allOS:
sortCode|npa|valueDate|cutOffDate
111111|99999999|01-11-2013|15-11-2013
222222|77777777|04-11-2013|15-11-2013
333333|88888888|05-11-2013|15-11-2013
444444|66666666|06-11-2013|15-11-2013
555555|44444444|07-11-2013|15-11-2013
666666|33333333|12-11-2013|15-11-2013
777777|55555555|13-11-2013|15-11-2013
888888|11111111|14-11-2013|15-11-2013
999999|22222222|15-11-2013|15-11-2013
Holidays:
holiday|reason
02-11-2013|Saturday
03-11-2013|Sunday
08-11-2013|Long Weekend
09-11-2013|Saturday
10-11-2013|Sunday
11-11-2013|Long Weekend
16-11-2013|Saturday
17-11-2013|Sunday`
Result:
sortCode|npa|valueDate|cutOffDate|Age
111111|99999999|01-11-2013|15-11-2013|8
222222|77777777|04-11-2013|15-11-2013|7
333333|88888888|05-11-2013|15-11-2013|6
444444|66666666|06-11-2013|15-11-2013|5
555555|44444444|07-11-2013|15-11-2013|4
666666|33333333|12-11-2013|15-11-2013|3
777777|55555555|13-11-2013|15-11-2013|2
888888|11111111|14-11-2013|15-11-2013|1
999999|22222222|15-11-2013|15-11-2013|0
The results for age is the difference between the valueDate and cutOffDate less any of the days from the holiday table.

You can use a correlated subquery to calculate the number of non-work days included in each valueDate and cutOffDate date range.
Here is a preliminary query I tested with your sample data, and I included the first and last rows output from that query.
SELECT
a.sortCode,
a.npa,
a.valueDate,
a.cutOffDate,
DateDiff('d', a.valueDate, a.cutOffDate) AS raw_days,
(
SELECT Count(*)
FROM Holidays
WHERE holiday BETWEEN a.valueDate AND a.cutOffDate
) AS NonWorkDays
FROM allOS AS a;
sortCode npa valueDate cutOffDate raw_days NonWorkDays
-------- -------- ---------- ---------- -------- -----------
111111 99999999 11/1/2013 11/15/2013 14 6
999999 22222222 11/15/2013 11/15/2013 0 0
Notice the last row. The raw_days value is zero because both valueDate and cutOffDate are the same. If you want that to be one day, add one to the value returned by the DateDiff expression.
After you adjust that preliminary query as needed, you can use it as the data source for another query where you can calculate Age as raw_days - NonWorkDays. But I'll leave that final piece for you in case I've botched the preliminary query.
If subqueries are unfamiliar to you, I recommend two of Allen Browne's pages for useful background information: Subquery basics and Surviving Subqueries.
Also note that correlated subqueries demand extra work from the db engine. That SELECT Count(*) subquery must be run separately for each row of the table. You should have Holidays.holiday indexed to ease the db engine's burden.

It is easy, just read about with clause.
With it, you can run the first query and the second query then take the result and process it in the third query inside with clause
http://www.oracle-base.com/articles/misc/with-clause.php

Related

SELECT MIN from a subset of data obtained through GROUP BY

There is a database in place with hourly timeseries data, where every row in the DB represents one hour. Example:
TIMESERIES TABLE
id date_and_time entry_category
1 2017/01/20 12:00 type_1
2 2017/01/20 13:00 type_1
3 2017/01/20 12:00 type_2
4 2017/01/20 12:00 type_3
First I used the GROUP BY statement to find the latest date and time for each type of entry category:
SELECT MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category;
However now, I want to find which is the date and time which is the LEAST RECENT among the datetime's I obtained with the query listed above. I will need to use somehow SELECT MIN(date_and_time), but how do I let SQL know I want to treat the output of my previous query as a "new table" to apply a new SELECT query on? The output of my total query should be a single value—in case of the sample displayed above, date_and_time = 2017/01/20 12:00.
I've tried using aliases, but don't seem to be able to do the trick, they only rename existing columns or tables (or I'm misusing them..).There are many questions out there that try to list the MAX or MIN for a particular group (e.g. https://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ or Select max value of each group) which is what I have already achieved, but I want to do work now on this list of obtained datetime's. My database structure is very simple, but I lack the knowledge to string these queries together.
Thanks, cheers!
You can use your first query as a sub-query, it is similar to what you are describing as using the first query's output as the input for the second query. Here you will get the one row out put of the min date as required.
SELECT MIN(date_and_time)
FROM (SELECT MAX(date_and_time) as date_and_time, entry_category
FROM timeseries_table
GROUP BY entry_category)a;
Is this what you want?
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC;
This returns ties. If you do not want ties, then include an additional sort key:
SELECT TOP 1 MAX(date_and_time), entry_category
FROM timeseries_table
GROUP BY entry_category
ORDER BY MAX(date_and_time) ASC, entry_category;

How to pull the last N number of dates in SAS SQL

I am dealing with a large dataset (30 million rows) and I need to pull the most recent three dates (which may have an indeterminate number of rows attached to them) so like 03MAR2016 might have 2 rows 27FEB2016 might have ten and 25FEB2016 might have 3. How do I say "Select everything that falls within the last X number of values in this set regardless of how many rows there are"?
As you can not sort in an in-line view/subquery you will have to split your SQL statement in two parts:
Sort the date DESCENDING and get the distinct values
Join back to the original data and limit to first 3
But as stated before, SQL is not good at this kind of operation.
DATA input_data ;
INPUT date value ;
CARDS ;
20160101 1
20160101 2
20160101 3
20160102 1
20160103 1
20160104 1
20160105 1
20160105 2
20160105 3
;
proc sql _method;
create table DATE_ID as
select distinct DATE
from input_data
order by DATE DESC;
create table output_data as
select data.*
from (select *
from DATE_ID
where monotonic() <= 3
) id
inner join input_data data
on id.DATE = data.DATE
;
quit;
You need to break this down into two tasks.
Determine which dates are the last three dates
Pull all rows from those dates
Both are possible in SQL, though the first is much easier using other methods (SAS's SQL isn't very good at getting the "first X things").
I would suggest using something like PROC FREQ or PROC TABULATE to generate the list of dates (just a PROC FREQ on the date variable), really any proc you're comfortable with - even PROC SORT would work (though that's probably less efficient). Then once you have that table, limit it to the three highest observations, and then you can use it in a SQL step to join to the main table and filter to those three dates - or you can use other options, like creating a custom format or hash tables or whatever works for you. 30 million rows isn't so many that a SQL join should be a problem, though, I'd think.

12 month moving average by person, date

I have a table [production] that contains the following structure:
rep (char(10))
,cyc_date (datetime) ---- already standardized to mm/01/yyyy
,amt (decimal)
I have data for each rep from 1/1/2011 to 8/1/2013. What I want to be able to do is create a 12 month moving average beginning 1/1/2012 for each rep, as follows:
rep cyc_dt 12moAvg
-------------------------
A 1/1/2012 10000.01
A 2/1/2012 13510.05
. ........ ........
A 8/1/2013 22101.32
B 1/1/2012 98328.22
B ........ ........
where each row represents the 12 month moving average for said rep at stated time. I found some examples that were vaguely close and I tried them to no avail. It seems the addition of a group by rep component is the major departure from other examples.
This is about as far as I got:
SELECT
rep,
cyc_date,
(
SELECT Avg([amt])
FROM production Q
WHERE Q.[cyc_date] BETWEEN DateAdd("yyyy",-1,[cyc_date]+1) AND [cyc_date]
) AS 12moavg
FROM production
That query seems to pull an overall average or sum, since there is no grouping in the correlated subquery. When I try to group by, I get an error that it can only return at most one row.
I think it may work with 2 adjustments to the correlated subquery.
Subtract 11 months in the DateAdd() expression.
Include another WHERE condition to limit the average to the same rep as the current row of the parent (containing) query.
SELECT
p.rep,
p.cyc_date,
(
SELECT Avg(Q.amt)
FROM production AS Q
WHERE
Q.rep = p.rep
AND
Q.cyc_date BETWEEN DateAdd("m", -11, p.cyc_date)
AND p.cyc_date
) AS [12moavg]
FROM production AS p;
Correlated subqueries can be slow. Make sure to index rep and cyc_date to limit the pain with this one.

Getting repeated rows for where with or condition

I am trying find employees that worked during a specific time period and the hours they worked during that time period. My query has to join the employee table that has employee id as pk and uses effective_date and expiration_date as time measures for the employee's position to the timekeeping table that has a pay period id number as pk and also uses effective and expiration dates.
The problem with the expiration date in the employee table is that if the employee is currently employed then the date is '12/31/9999'. I am looking for employees that worked in a certain year and current employees as well as the hours they worked separated by pay periods.
When I take this condition in account in the where with an OR statement, I get duplicates that is employees that have worked the time period I am looking for and beyond as well as duplicate records for the '12/31/9999' and the valid employee in that time period.
This is the query I am using:
SELECT
J.EMPL_ID
,J.DEPT
,J.UNIT
,J.LAST_NM
,J.FIRST_NM
,J.TITLE
,J.EFF_DT
,J.EXP_DT
,TM1.PPRD_ID
,TM1.EMPL_ID
,TM1.EXP_DT
,TM1.EFF_DT
--PULLING IN THE DAILY HRS WORKED
,(SELECT NVL(SUM(((to_number(SUBSTR(TI.DAY_1, 1
,INSTR(TI.DAY_1, ':', 1, 1)-1),99))*60)+
(TO_NUMBER(SUBSTR(TI.DAY_1
,INSTR(TI.DAY_1,':', -1, 1)+1),99))),0)
FROM PPRD_LINE TI
WHERE
TI.PPRD_ID=TM1.PPRD_ID
) "DAY1"
---AND THE REST OF THE DAYS FOR THE WORK PERIOD
FROM PPRD_LINE TM1
JOIN EMPL J ON TM1.EMPL_ID=J.EMPL_ID
WHERE
J.EMPL_ID='some id number' --for test purposes, will need to break down to depts-
AND
J.EFF_DT >=TO_DATE('1/1/2012','MM/DD/YYYY')
AND
(
J.EXP_DT<=TO_DATE('12/31/2012','MM/DD/YYYY')
OR
J.EXP_DT=TO_DATE('12/31/9999','MM/DD/YYYY') --I think the problem might be here???
)
GROUP BY
J.EMPL_ID
,J.DEPT
,J.UNIT
,J.LAST_NM
,J.FIRST_NM
,J.TITLE
,J.EFF_DT
,J.EXP_DT
,TM1.PPRD_ID
,TM1.EMPL_ID
,TM1.DOC_ID
,TM1.EXP_DT
,TM1.EFF_DT
ORDER BY
J.EFF_DT
,TM1.EFF_DT
,TM1.EXP_DT
I'm pretty sure I'm missing something simple but at this point I can't see the forest for the trees. Can anyone out there point me in the right direction?
an example of the duplicate records:
for employee 1 for the year of 2012:
Empl_ID Dept Unit Last First Title Eff Date Exp Date PPRD ID Empl_ID
00001 04 012 Babbage Charles Somejob 4/1/2012 10/15/2012 0407123 00001
Exp Date_1 Eff Date_1
4/15/2012 4/1/2012
this record repeats 3 times and goes past the pay periods in 2012 to the current pay period in 2013
the subquery I use to convert time to be able to add hrs and mins together to compare down the line.
I'm going to take a wild guess and see if this is what you want, remember I could not test so there may be typos.
If this is and especially if it is not, you should read in the FAQ about how to ask good questions. If this is what you were trying to understand your question should have been answered within about 10 mins. Because it was not clear what you were asking no one could answer your question.
You should include inputs and outputs and EXPECTED output in your question. The data you gave was not the output of the select statement (it did not have the DAY1 column).
SELECT
J.EMPL_ID
,J.DEPT
,J.UNIT
,J.LAST_NM
,J.FIRST_NM
,J.TITLE
,J.EFF_DT
,J.EXP_DT
,TM1.PPRD_ID
,TM1.EMPL_ID
-- ,TM1.EXP_DT Can't have these if you are summing accross multiple records.
-- ,TM1.EFF_DT
--PULLING IN THE DAILY HRS WORKED
,NVL(SUM(((to_number(SUBSTR(TM1.DAY_1, 1,INSTR(TM1.DAY_1, ':', 1, 1)-1),99))*60)+
(TO_NUMBER(SUBSTR(TM1.DAY_1,INSTR(TM1.DAY_1,':', -1, 1)+1),99))),0)
"DAY1"
---AND THE REST OF THE DAYS FOR THE WORK PERIOD
FROM PPRD_LINE TM1
JOIN EMPL J ON TM1.EMPL_ID=J.EMPL_ID
WHERE
J.EMPL_ID='some id number' --for test purposes, will need to break down to depts-
AND J.EFF_DT >=TO_DATE('1/1/2012','MM/DD/YYYY')
AND(J.EXP_DT<=TO_DATE('12/31/2012','MM/DD/YYYY') OR J.EXP_DT=TO_DATE('12/31/9999','MM/DD/YYYY'))
GROUP BY
J.EMPL_ID
,J.DEPT
,J.UNIT
,J.LAST_NM
,J.FIRST_NM
,J.TITLE
,TM1.PPRD_ID
,TM1.EMPL_ID
,TM1.DOC_ID
ORDER BY
MIN(J.EFF_DT)
,MAX(TM1.EFF_DT)
,MAX(TM1.EXP_DT)

How to use MIN, IFF and Datepart functions together in MS ACCESS?

I have a table like below, I want to group date time entries based on date without time and shifts for example, morning shift starts at 5 AM and ends at 14 PM. Here, MAX function finds the correct date, could you help me to see what's wrong with MIN function?
Indate Incondition
--------- -----------
25.01.2013 05:00:38 KT-RING
25.01.2013 05:21:52 KT-EMPTY
25.01.2013 05:22:00 KT-PROCESS
25.01.2013 06:10:50 KT-RING
25.01.2013 16:10:50 KT-EMPTY
26.01.2013 06:10:50 KT-RING
SELECT Int(Indate) AS DATE,
Min( IIf( ( DatePart('h',[Indate])>=05 AND DatePart('h', [Indate])<13), Indate, 0)) AS FRUHRINGMIN,
Max(IIf((DatePart('h',Indate)>=05 And
DatePart('h',Indate)<13), Indate,0)) AS FRUHRINGMAX
FROM TABLE WHERE Incondition= 'KT-RING'
GROUP BY Int(Indate);
RESULT:
DATE FRUHRINGMIN FRUHRINGMAX
----- ------------- -----------
25.01.2013 00:00:00 25.01.2013 06:10:50
26.01.2013 00:00:00 26.01.2013 06:10:50
I saved your sample data in a table in my Access 2007 database. But when I attempted to run your query, Access threw an error about the alias DATE, which is a reserved word. Bracketing that alias allowed the query to run without error.
SELECT
Int(Indate) AS [DATE],
Min(IIf((DatePart('h',[Indate])>=05 AND DatePart('h', [Indate])<13), Indate, 0)) AS FRUHRINGMIN,
Max(IIf((DatePart('h',Indate)>=05 And DatePart('h',Indate)<13), Indate,0)) AS FRUHRINGMAX
FROM tblJeanneQuadel
WHERE Incondition= 'KT-RING'
GROUP BY Int(Indate);
However the results it gave me did not match what you reported.
DATE FRUHRINGMIN FRUHRINGMAX
41299 1/25/2013 5:00:38 AM 1/25/2013 6:10:50 AM
41300 1/26/2013 6:10:50 AM 1/26/2013 6:10:50 AM
Note my Date/Time values are in US format, but they're actually based on the same values as yours, just displayed differently.
I don't understand why your query result displayed the first column as a date rather than a long integer as mine did and which I would expect as the result from Int(Indate). But that's a minor point; we can convert from one to the other if needed.
More importantly, I'm unsure about what's actually going on. If bracketing [DATE] does not allow your query to run and produce the correct results, try moving the IIf() condition into the WHERE clause. That would greatly simplify your Min() and Max() expressions. But if that still doesn't produce exactly the results you want, show us what it does return and what you want returned instead.