SQL Aggregation with only one table - sql

So this problem has been bugging me a little for the last week or so. I'm working with a database which hasn't exactly been designed in a way that I like and I'm having to do a lot of work-arounds to get the queries to function in a way I would like.
Essentially, I'm trying to remove duplicate entries that occur as a result of an instance caused by a previous entry. For the sake of argument say that a customer places an order or issues a job (this only occurs once) but as a result of the interactions a series of other rows are created to represent, sub-orders or jobs. Essentially, all duplicate records should have the same finish time so what I'm trying to create is a query which will return the record which has the earliest start time and ignore all other records which have the same finish time. All this occurs within the same table.
Something like:
select starttime
, endtime
, description
, entrynumber
from table
where starttime = min
and endtime = endtime

Probably what you want is something like this:
;WITH OrderedTable AS
(
Select ROW_NUMBER() OVER (PARTITION BY endtime ORDER BY starttime) as rn, starttime, endtime, description, entrynumber
From Table
)
Select starttime, endtime, description, entrynumber
FROM OrderedTable
WHERE rn=1
What this does is group all the rows with the same end time, ordered by start time and give them an additional "row number" column starting at 1 and increasing. If you filter by rn = 1, you get only the earliest start time rows, ignoring the rest.

Related

How to flag consecutive shifts in SQL efficiently

I have a dataset that contains 250,000 rows and is expected to grow at around 100,000 rows a month.
I have data that contains the following columns:
ShiftDate (Day a shift occurred on),
Shift Start Time,
Shift End Time
and Employee Number.
I would like to flag consecutive shifts with a 1 when an Employees Shift End Time was within 4 hours of the start time of their next shift, otherwise flag it with a 0.
my data table
I have tried running a query that joins the table to itself but the run time is too long. I was planning to create the flag based on a case statement using 'NextStart':
select shiftdate,
shiftstarttime,
shiftendtime,
EmployeeID,
(select min(t2.shiftstarttime) from TABLE t2 where t1.EmployeeID=t2.EmployeeID and T2.shiftstarttime > t1.Shiftendtime) as NextStart
from
TABLE t1
I would love to know a more efficient way of trying to do this.
Thanks!
Select shiftdate, shiftstarttime, shiftendtime, employeeid,
(Case when
lead(shiftstarttime, 1) over (partition by employeeid order by shiftdate, shiftstarttime) - shiftendtime < 4 then 1 else 0 end) as consecutive_shift_flag
from table_name
In this query lead() window function is used to get the next shift start time for that employee
lead(shiftstarttime, 1) over (partition by employeeid order by shiftdate, shiftstarttime)
In case this is not what you are looking for then please share the Sample of correct output based on input data for couple of cases.

Subquery sorted by another table

This is a follow-up question to Get the following record in query.
But the task is a bit more complicated. I tried to modify the SQL query but I was not able to fulfill the task.
If we have two tables, one called Activity and has rows [ActivityCode and StartTime], and the another one called Students has rows [Name and ID]
for example:
Name-----ID-----ActivityCode-----StartTime<BR>
Tom------123------Lunch------------1200<BR>
Tom------123------MathClass--------1300<BR>
Tom------123------EnglishClass-----1500<BR>
Tom------123------EndOfSchool------1700<BR>
Mary-----369-----Lunch------------1200<BR>
Mary-----369-----ScienceClass-----1300<BR>
Mary-----369-----EnglishClass-----1600<BR>
Mary-----369-----EndOfSchool------1700<BR>
And now I want to make one SQL Query to display as follow:
Name-----ID------ActivityCode-----StartTime------EndTime<BR>
Tom------123--- Lunch------------1200-----------1300<BR>
Tom------123-----MathClass--------1300-----------1500<BR>
Tom------123-----EnglishClass-----1500-----------1700<BR>
Tom------123-----EndOfSchool------1700-----------1700<BR>
Mary-----369-----Lunch------------1200-----------1300<BR>
Mary-----369-----ScienceClass-----1300-----------1600<BR>
Mary-----369-----EnglishClass-----1600-----------1700<BR>
Mary-----369-----EndOfSchool------1700-----------1700<BR>
I follow the code, credits to Gustav:
SELECT
Activity.ActivityCode,
Activity.StartTime,
Nz((Select Top 1 StartTime
From Activity As T
Where T.StartTime > Activity.StartTime
Order By StartTime Asc),
[StartTime]) AS EndTime,
CDate(TimeSerial(Val([EndTime])\100,Val([EndTime]) Mod 100,0)-
TimeSerial(Val([StartTime])\100,Val([StartTime]) Mod 100,0)) AS Duration
FROM
Activity;
I tried to modify the part
Order By StartTime Asc
Because the whole query is sorted according to the Student ID which is from another table. But some message boxes popped up and I couldn't solve it. How can I modify it? thank you.
Consider this SQL:
SELECT Students.ID, [Name], ActivityCode, StartTime,
Nz((Select Top 1 StartTime FROM Activity As T
WHERE T.StartTime > Activity.StartTime AND T.ID=Activity.ID
ORDER BY StartTime Asc),Startime) AS EndTime,
DateDiff("h",TimeSerial(StartTime/100,0,0),TimeSerial(EndTime/100,0,0)) AS Duration
FROM Students INNER JOIN Activity ON Students.ID=Activity.ID;

How Can I Retrieve The Earliest Date and Status Per Each Distinct ID

I have been trying to write a query to perfect this instance but cant seem to do the trick because I am still receiving duplicated. Hoping I can get help how to fix this issue.
SELECT DISTINCT
1.Client
1.ID
1.Thing
1.Status
MIN(1.StatusDate) as 'statdate'
FROM
SAMPLE 1
WHERE
[]
GROUP BY
1.Client
1.ID
1.Thing
1.status
My output is as follows
Client Id Thing Status Statdate
CompanyA 123 Thing1 Approved 12/9/2019
CompanyA 123 Thing1 Denied 12/6/2019
So although the query is doing what I asked and showing the mininmum status date per status, I want only the first status date. I have about 30k rows to filter through so whatever does not run overload the query and have it not run. Any help would be appreciated
Use window functions:
SELECT s.*
FROM (SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY statdate) as seqnum
FROM SAMPLE s
WHERE []
) s
WHERE seqnum = 1;
This returns the first row for each id.
Use whichever of these you feel more comfortable with/understand:
SELECT
*
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY statusdate) as rn
FROM sample
WHERE ...
) x
WHERE rn = 1
The way that one works is to number all rows sequentially in order of StatusDate, restarting the numbering from 1 every time ID changes. If you thus collect all the number 1's togetyher you have your set of "first records"
Or can coordinate a MIN:
SELECT
*
FROM
sample s
INNER JOIN
(SELECT ID, MIN(statusDate) as minDate FROM sample WHERE ... GROUP BY ID) mins
ON s.ID = mins.ID and s.StatusDate = mins.MinDate
WHERE
...
This one prepares a list of all the ID and the min date, then joins it back to the main table. You thus get all the data back that was lost during the grouping operation; you cannot simultaneously "keep data" and "throw away data" during a group; if you group by more than just ID, you get more groups (as you have found). If you only group by ID you lose the other columns. There isn't any way to say "GROUP BY id, AND take the MIN date, AND also take all the other data from the same row as the min date" without doing a "group by id, take min date, then join this data set back to the main dataset to get the other data for that min date". If you try and do it all in a single grouping you'll fail because you either have to group by more columns, or use aggregating functions for the other data in the SELECT, which mixes your data up; when groups are done, the concept of "other data from the same row" is gone
Be aware that this can return duplicate rows if two records have identical min dates. The ROW_NUMBER form doesn't return duplicated records but if two records have the same minimum StatusDate then which one you'll get is random. To force a specific one, ORDER BY more stuff so you can be sure which will end up with 1

Group records for hourly count

My goal is to build an hourly count for records that have a start date/time and an end date/time. The actual records are never more than 24 hours from start to finish but many times are less. It works if I bounce every record against my "clock" which has 24 slots for every date up to "today". But it can take forever to run as there can be 2000 records in a day.
This is the detail I get:
The date/times in green are what I want as the start date/time for a group. The blue date/times are what I want as the end date time for the group.
Like this:
I have tried partitioning but because, in the second pic, the 4th row has the same values as the 2nd row, it groups them together even though there is a time span between them - the third row.
This is a gaps-and-islands problem. The start and end dates match on adjacent rows, so a difference of row numbers seems sufficient:
select id, min(startdatetime), max(enddatetime),
d_id, class, location
from (select t.*,
row_number() over (partition by id order by startdatetime) as seqnum,
row_number() over (partition by id, d_id, class, location) as seqnum_2
from t
) t
group by id, d_id, class, location, (seqnum - seqnum_2);
order by id, min(startdatetime);

Latest date and time in SQL without ORDER BY

I'm trying to find a way to display the last event held (last date and time) in an events table whilst displaying all the columns for that event without using ORDER BY.
For example:
SELECT * from Events
where dateheld in (select max(dateheld) from events)
AND starttime in (select max(starttime) from events)
When I put MAX starttime, it displays nothing. When I put MIN starttime it works but displays the earliest time of that date and not the latest.
I guess you could print out your records, throw them down the stairs, and the ones that go farthest have the "lightest" dates. You cannot sort without order by. It's like wanting water that isn't wet. Unless your data naturally comes out in the order you want, you MUST sort.
Of course, if you want only the record that has the absolute most recent date, and don't need more than just that one record, then
SELECT yourdatetimefield, ...
FROM yourtable
HAVING yourdatetimefield = MAX(yourdatetimefield)
If you are only looking for the latest item:
EDIT gets a little more complicated when you have seperate date and time fields, but this should work. This is a ridiculous kludge for a situation where date and time should be stored in one field.
SELECT *
FROM (
SELECT *
FROM Events
WHERE dateTime = (SELECT MAX(dateheld) FROM Events)
) temp
WHERE starttime = (SELECT MAX(starttime) FROM (
SELECT *
FROM Events
WHERE dateTime = (SELECT MAX(dateheld) FROM Events)
) temp 2 )