How to flag consecutive shifts in SQL efficiently - sql

I have a dataset that contains 250,000 rows and is expected to grow at around 100,000 rows a month.
I have data that contains the following columns:
ShiftDate (Day a shift occurred on),
Shift Start Time,
Shift End Time
and Employee Number.
I would like to flag consecutive shifts with a 1 when an Employees Shift End Time was within 4 hours of the start time of their next shift, otherwise flag it with a 0.
my data table
I have tried running a query that joins the table to itself but the run time is too long. I was planning to create the flag based on a case statement using 'NextStart':
select shiftdate,
shiftstarttime,
shiftendtime,
EmployeeID,
(select min(t2.shiftstarttime) from TABLE t2 where t1.EmployeeID=t2.EmployeeID and T2.shiftstarttime > t1.Shiftendtime) as NextStart
from
TABLE t1
I would love to know a more efficient way of trying to do this.
Thanks!

Select shiftdate, shiftstarttime, shiftendtime, employeeid,
(Case when
lead(shiftstarttime, 1) over (partition by employeeid order by shiftdate, shiftstarttime) - shiftendtime < 4 then 1 else 0 end) as consecutive_shift_flag
from table_name
In this query lead() window function is used to get the next shift start time for that employee
lead(shiftstarttime, 1) over (partition by employeeid order by shiftdate, shiftstarttime)
In case this is not what you are looking for then please share the Sample of correct output based on input data for couple of cases.

Related

Impala get the difference between 2 dates excluding weekends

I'm trying to get the day difference between 2 dates in Impala but I need to exclude weekends.
I know it should be something like this but I'm not sure how the weekend piece would go...
DATEDIFF(resolution_date,created_date)
Thanks!
One approach at such task is to enumerate each and every day in the range, and then filter out the week ends before counting.
Some databases have specific features to generate date series, while in others offer recursive common-table-expression. Impala does not support recursive queries, so we need to look at alternative solutions.
If you have a table wit at least as many rows as the maximum number of days in a range, you can use row_number() to offset the starting date, and then conditional aggregation to count working days.
Assuming that your table is called mytable, with column id as primary key, and that the big table is called bigtable, you would do:
select
t.id,
sum(
case when dayofweek(dateadd(t.created_date, n.rn)) between 2 and 6
then 1 else 0 end
) no_days
from mytable t
inner join (select row_number() over(order by 1) - 1 rn from bigtable) n
on t.resolution_date > dateadd(t.created_date, n.rn)
group by id

SQL Aggregation with only one table

So this problem has been bugging me a little for the last week or so. I'm working with a database which hasn't exactly been designed in a way that I like and I'm having to do a lot of work-arounds to get the queries to function in a way I would like.
Essentially, I'm trying to remove duplicate entries that occur as a result of an instance caused by a previous entry. For the sake of argument say that a customer places an order or issues a job (this only occurs once) but as a result of the interactions a series of other rows are created to represent, sub-orders or jobs. Essentially, all duplicate records should have the same finish time so what I'm trying to create is a query which will return the record which has the earliest start time and ignore all other records which have the same finish time. All this occurs within the same table.
Something like:
select starttime
, endtime
, description
, entrynumber
from table
where starttime = min
and endtime = endtime
Probably what you want is something like this:
;WITH OrderedTable AS
(
Select ROW_NUMBER() OVER (PARTITION BY endtime ORDER BY starttime) as rn, starttime, endtime, description, entrynumber
From Table
)
Select starttime, endtime, description, entrynumber
FROM OrderedTable
WHERE rn=1
What this does is group all the rows with the same end time, ordered by start time and give them an additional "row number" column starting at 1 and increasing. If you filter by rn = 1, you get only the earliest start time rows, ignoring the rest.

SELECT statement optimization

I'm not so expert in SQL queryes, but not even a complete newbie.
I'm exporting data from a MS-SQL database to an excel file using a SQL query.
I'm exporting many columns and two of this columns contain a date and an hour, this are the columns I use for the WHERE clause.
In detail I have about 200 rows for each day, everyone with a different hour, for many days. I need to extract the first value after the 15:00 of each day for more days.
Since the hours are different for each day i can't specify something like
SELECT a,b,hour,day FROM table WHERE hour='15:01'
because sometimes the value is at 15:01, sometimes 15:03 and so on (i'm looking for the closest value after the 15:00), for fix this i used this workaround:
SELECT TOP 1 a,b,hour,day FROM table WHERE hour > "15:00"
in this way i can take the first value after the 15:00 for a day...the problem is that i need this for more days...for a user-specifyed interval of days. At the moment i fix this with a UNION ALL statement, like this:
SELECT TOP 1 a,b,hour,day FROM table WHERE data="first_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="second_day" AND hour > "15:00"
UNION ALL SELECT TOP 1 a,b,hour,day FROM table WHERE data="third_day" AND hour > "15:00"
...and so on for all the days (i build the SQL string with a for each day in the specifyed interval).
Until now this worked, but now I need to expand the days interval (now is maximun a week, so 5 days) to up to 60 days. I don't want to build an huge query string, but i can't imagine an alternative way for write the SQL.
Any help appreciated
Ettore
I typical solution for this uses row_number():
SELECT a, b, hour, day
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY day ORDER BY hour) as seqnum
FROM table t
WHERE hour > '15:00'
) t
WHERE seqnum = 1;

how would i get the average of a previous date and update it?

I want to write a query that will have the average(that wont be hard) but when I get that average I want to save it somewhere. Let's I have a average save from last month table_a.last_month_average. And now I run the query again and this would be the current_month_average. I want to compare this two columns and see if the current_month_average increase from last_month_average.
After I compare I would like to output the biggest average number from those two. After I do this I would like to move the current_month_average to last_month_average so that one becomes the old average when next month the query runs.
Is this possible in sql? or maybe there is a better way to do this?any suggestions will help.
After I compare I would like to output the biggest average number from those two. After I do this I would like to move the current_month_average to last_month_average so that one becomes the old average when next month the query runs.
By my understanding, this operation is to select maximum month_average from all history records. So you don't need to keep a record of current_month_average and last_month_average. Instead, a table of all history month average is helpful. Assume there is table named monthaverage with columns (Id, Month, Average), you can query
SELECT TOP 1 T1.*
, CASE WHEN
T1.Average > (SELECT TOP 1 T2.Average
FROM monthaverage T2
WHERE T2.Month < T1.Month
ORDER BY Month DESC)
THEN 'Increased'
ELSE 'Not Increased'
END
FROM monthaverage T1
ORDER BY T1.Average DESC
If you have chance to run it from SQL SERVER 2012, you can leverage LAST_VALUE function. Query is like
SELECT TOP 1 *, CASE WHEN Average > LAST_VALUE(Average) OVER (ORDER BY Month) THEN 'Increased' ELSE 'Not Increased' END
FROM monthaverage
ORDER BY Average DESC

MySQL to get the count of rows that fall on a date for each day of a month

I have a table that contains a list of community events with columns for the days the event starts and ends. If the end date is 0 then the event occurs only on the start day. I have a query that returns the number of events happening on any given day:
SELECT COUNT(*) FROM p_community e WHERE
(TO_DAYS(e.date_ends)=0 AND DATE(e.date_starts)=DATE('2009-05-13')) OR
(DATE('2009-05-13')>=DATE(e.date_starts) AND DATE('2009-05-13')<=DATE(e.date_ends))
I just sub in any date I want to test for "2009-05-13".
I need to be be able to fetch this data for every day in an entire month. I could just run the query against each day one at a time, but I'd rather run one query that can give me the entire month at once. Does anyone have any suggestions on how I might do that?
And no, I can't use a stored procedure.
Try:
SELECT COUNT(*), DATE(date) FROM table WHERE DATE(dtCreatedAt) >= DATE('2009-03-01') AND DATE(dtCreatedAt) <= DATE('2009-03-10') GROUP BY DATE(date);
This would get the amount for each day in may 2009.
UPDATED: Now works on a range of dates spanning months/years.
Unfortunately, MySQL lacks a way to generate a rowset of given number of rows.
You can create a helper table:
CREATE TABLE t_day (day INT NOT NULL PRIMARY KEY)
INSERT
INTO t_day (day)
VALUES (1),
(2),
…,
(31)
and use it in a JOIN:
SELECT day, COUNT(*)
FROM t_day
JOIN p_community e
ON day BETWEEN DATE(e.start) AND IF(DATE(e.end), DATE(e.end), DATE(e.start))
GROUP BY
day
Or you may use an ugly subquery:
SELECT day, COUNT(*)
FROM (
SELECT 1 AS day
UNION ALL
SELECT 2 AS day
…
UNION ALL
SELECT 31 AS day
) t_day
JOIN p_community e
ON day BETWEEN DATE(e.start) AND IF(DATE(e.end), DATE(e.end), DATE(e.start))
GROUP BY
day