Total number of days for a task before going on to the next one, grouped by person - sql

I am trying to figure out how to show how many days have been worked on a certain task by using the dates in between each “task login” for each person. I think this can be done with one query? I'm open to suggestions and/or ideas.
The Table:
--------+-----------+----------
Person | TaskLogin | Date
--------+-----------+----------
Jane | A | 2013-01-01
Jane | B | 2013-01-03
Jane | A | 2013-01-06
Jane | B | 2013-01-10
Bob | A | 2013-01-01
Bob | A | 2013-01-06
---------------------------------------------------------------------
Row 1: Jane starts task A starting 2013-01-01 and works on it until starting Task B on 2013-01-03 = 2 days worked on Task A
Row 2: Jane starts on task B starting 2013-01-03 and works on it until starting task A on 2013-01-06 = 3 days worked on Task B
Row 3: Jane starts on task A starting 2013-01-06 and works on it until starting task B on 2013-01-10 = 4 days worked on Task A
Row 4: Skip because that is the highest date for Jane (Jane may or may not finish task B 2013-01-10 but we will not count it)
Row 5: Bob starts task A starting on 2013-01-01 and works on it until continuing to work on task A by logging it again on 2013-01-06 = 5 days worked on task A
Row 6: Skip because that is the highest date for Bob
A = 11 days because 2 + 4 + 5
B = 3 days because of Row 2
The output:
------+---------------------
Tasks | Time between Tasks
------+---------------------
A | 11 days
B | 3 days
**EDIT:*****
The solutions of Nicarus and Gordon Linoff (first pre-2013 solution specifically, with my edits in the comments) works. Note that (select distinct * from table t) t for table can be added to Gordon Linoff's solution to accommodate for the case of someone logging in twice in the same day.

What you are looking for is the lead() function. This is only available in SQL Server 2012. Before that, the easiest way is a correlated subquery:
select TaskLogin, sum(datediff(day, date, nextdate)) as days
from (select t.*,
(select top 1 date
from table t2
where t2.person = t.person
order by date desc
) as nextdate
from table t
) t
where nextdate is not null
group by TaskLogin;
In SQL Server 2012, it would be:
select TaskLogin, sum(datediff(day, date, nextdate)) as days
from (select t.*, lead(date) over (partition by person order by date) as nextdate
from table t
) t
where nextdate is not null
group by TaskLogin;

Maybe not the most elegant way, but it certainly works:
-- Setup table/insert values --
IF OBJECT_ID('TempDB.dbo.#TaskAccounting') IS NOT NULL BEGIN
DROP TABLE #TaskAccounting
END
CREATE TABLE #TaskAccounting
(
Person VARCHAR(4) NOT NULL,
TaskLogin CHAR(1) NOT NULL,
TaskDate DATETIME NOT NULL
)
INSERT INTO #TaskAccounting
VALUES ('Jane','A','2013-01-01')
INSERT INTO #TaskAccounting
VALUES ('Jane','B','2013-01-03')
INSERT INTO #TaskAccounting
VALUES ('Jane','A','2013-01-06')
INSERT INTO #TaskAccounting
VALUES ('Jane','B','2013-01-10')
INSERT INTO #TaskAccounting
VALUES ('Bob','A','2013-01-01')
INSERT INTO #TaskAccounting
VALUES ('Bob','A','2013-01-06');
-- Use a CTE to add sequence and join on it --
WITH Tasks AS (
SELECT
Person,
TaskLogin,
TaskDate,
ROW_NUMBER() OVER(PARTITION BY Person ORDER BY TaskDate) AS Sequence
FROM
#TaskAccounting
)
SELECT
a.TaskLogin AS Tasks,
CAST(SUM(DATEDIFF(DD,a.TaskDate,b.TaskDate)) AS VARCHAR) + ' days' AS TimeBetweenTasks
FROM
Tasks a
JOIN
Tasks b
ON (a.Person = b.Person)
AND (a.Sequence = b.Sequence - 1)
GROUP BY
a.TaskLogin

Related

Selecting the most recent date

I have data structured like this:
ID | Enrolment_Date | Appointment1_Date | Appointment2_Date | .... | Appointment150_Date |
112 01/01/2015 01/02/2015 01/03/2018 01/08/2018
113 01/06/2018 01/07/2018 NULL NULL
114 01/04/2018 01/05/2018 01/06/2018 NULL
I need a new variable which counts the number of months between the enrolment_date and the most recent appointment. The challenge is is that all individuals have a different number of appointments.
Update: I agree with the comments that this is poor table design and it needs to be reformatted. Could proposed solutions please include suggested code on how to transform the table?
Since the OP is currently stuck with this bad design, I will point out a temporary solution. As others have suggested, you really must change the structure here. For now, this will suffice:
SELECT '['+ NAME + '],' FROM sys.columns WHERE OBJECT_ID = OBJECT_ID ('TableA') -- find all columns, last one probably max appointment date
SELECT ID,
Enrolment_Date,
CASE WHEN Appointment150_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment150_Date)
WHEN Appointment149_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment149_Date)
WHEN Appointment148_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment148_Date)
WHEN Appointment147_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment147_Date)
WHEN Appointment146_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment146_Date)
WHEN Appointment145_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment145_Date)
WHEN Appointment144_Date IS NOT NULL THEN DATEDIFF (MONTH, Enrolment_Date, Appointment144_Date) -- and so on
END AS NumberOfMonths
FROM TableA
This is a very ugly temporary solution and should be considered as such.
You will need to restructure your data, the given structure is poor database design. Create two separate tables - one called users and one called appointments. The users table contains the user id, enrollment date and any other specific user information. Each row in the appointments table contains the user's unique id and a specific appointment date. Structuring your tables like this will make it easier to write a query to get days/months since last appointment.
For example:
Users Table:
ID, Enrollment_Date
1, 2018-01-01
2, 2018-03-02
3, 2018-05-02
Appointments Table:
ID, Appointment_Date
1, 2018-01-02
1, 2018-02-02
1, 2018-02-10
2, 2018-05-01
You would then be able to write a query to join the two tables together and calculate the difference between the enrollment date and min value of the appointment date.
It is better if you can create two tables.
Enrolment Table (dbo.Enrolments)
ID | EnrolmentDate
1 | 2018-08-30
2 | 2018-08-31
Appointments Table (dbo.Appointments)
ID | EnrolmentID | AppointmentDate
1 | 1 | 2018-09-02
2 | 1 | 2018-09-03
3 | 2 | 2018-09-01
4 | 2 | 2018-09-03
Then you can try something like this.
If you want the count of months from Enrolment Date to the final appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MAX(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
And, If you want the count of months from Enrolment Date to the nearest appointment date then use below query.
SELECT E.ID, E.EnrolmentDate, A.NoOfMonths
FROM dbo.Enrolments E
OUTER APPLY
(
SELECT DATEDIFF(mm, E.EnrolmentDate, MIN(A.AppointmentDate)) AS NoOfMonths
FROM dbo.Appointments A
WHERE A.EnrolmentId = E.ID
) A
Try this on sqlfiddle
You have a lousy data structure, as others have noted. You really one a table with one row per appointment. After all, what happens after the 150th appointment?
select t.id, t.Enrolment_Date,
datediff(month, t.Enrolment_Date, m.max_Appointment_Date) as months_diff
from t cross apply
(select max(Appointment_Date) as max_Appointment_Date
from (values (Appointment1_Date),
(Appointment2_Date),
. . .
(Appointment150_Date)
) v(Appointment_Date)
) m;

Join only top (latest) values for each rown in other table

I am new to SQL, so existing answers a bit complicated to me.
I have three tables:
WORKER
|id
|name
|date
|...
JOB
|id
|name
|salary
|accept
APPOINTMENT
|id
|worker_id
|job_id
|date
So if worker was appointed several times in a year I need to know what work he had at some specified time.
I have something like this now:
SELECT w.name,w.id FROM worker w
INNER JOIN appointment a ON w.id = worker_id
INNER JOIN job j on job_id = j.id
WHERE accept = 1 AND a.date <= (SELECT date FROM orders WHERE id = 2);
Now it shows all appointments less or equal than some date, but I need only last one for each worker.
How I need to modify it?
EDIT:
ORDER
|id
|accepted_by //worker_id
|...
Orders is used to just get date. It can be changed to any from any source. So it isn't important in this context.
Accept in Job is just a bool value, it represents that appointed worker can accept new orders.
So the full meaning for this is to show in edit form in ComboBox all workers that was able to accept an order (not just workers that can accept it now), when the order was created.
Date is represented as integer value of days from 1970.
Lines that are pretend to be in output:
w.name w.id a.id a.date j.name j.accept
Smith 2 7 42999 administrator 1
Joe 1 6 42994 administrator 1
Smith 2 5 42994 waiter 0
Joe 1 4 42993 waiter 0
Smith 2 3 42992 administrator 1
Smith 2 2 42991 waiter 0
Smith 2 1 42990 administrator 1
What I receive with my query (it is listed above this edit) a.date <= 42998 and accept = 1;
Joe 1 6 42994 administrator 1
Smith 2 3 42992 administrator 1 //isn't current Smith's job
Smith 2 1 42990 administrator 1 //isn't current Smith's job
What I should to receive with my query a.date <= 42998;
//Last job in 42998
Joe 1 6 42994 administrator 1
Smith 2 5 42994 waiter 0
What I should to receive with my query a.date <= 42999;
//Last job in 42999
Smith 2 7 42999 administrator 1
Joe 1 6 42994 administrator 1
What I need to receive finally (a.date <= 42998 and accept=1):
//Workers which were able to accept order in 42998
Joe 1 6 42994 administrator 1
What I should to receive if (a.date <= 42999 and accept=1) ;
//Workers which were able to accept order in 42999
Smith 2 7 42999 administrator 1
Joe 1 6 42994 administrator 1
Tables (all unused fields are removed):
CREATE TABLE appointment (id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, worker_id INTEGER NOT NULL, job_id INTEGER, date INTEGER NOT NULL);
CREATE TABLE worker (id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,name TEXT NOT NULL);
CREATE TABLE job (id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,name TEXT NOT NULL,accept INTEGER NOT NULL);
Inserts (like in example above):
INSERT INTO worker (name) VALUES ('Joe');
INSERT INTO worker (name) VALUES ('Smith');
INSERT INTO job (name,accept) VALUES ('waiter',0);
INSERT INTO job (name,accept) VALUES ('administrator',1);
INSERT INTO appointment (worker_id,job_id,date) VALUES (2,2,42990);
INSERT INTO appointment (worker_id,job_id,date) VALUES (2,1,42991);
INSERT INTO appointment (worker_id,job_id,date) VALUES (2,2,42992);
INSERT INTO appointment (worker_id,job_id,date) VALUES (1,1,42993);
INSERT INTO appointment (worker_id,job_id,date) VALUES (2,1,42994);
INSERT INTO appointment (worker_id,job_id,date) VALUES (1,2,42994);
INSERT INTO appointment (worker_id,job_id,date) VALUES (2,2,42999);
To get the last appointment for each worker, use grouping. Filter out the non-accepted ones in a second step:
SELECT ...
FROM (SELECT worker_id,
job_id,
MAX(date) AS date
FROM appointment
WHERE date <= ...
GROUP BY worker_id) AS a
JOIN worker AS w ON a.worker_id = w.id
JOIN job AS j ON a.job_id = j.id
WHERE accept = 1;

Report on a point in time

I am about to create what I assume will be 2 new tables in SQL. The idea is for one to be the "live" data and a second which would hold all the changes. Dates are in DD/MM/YYYY format.
Active
ID | Name | State Date | End Date
1 Zac 1/1/2016 -
2 John 1/5/2016 -
3 Sam 1/6/2016 -
4 Joel 1/7/2016 -
Changes
CID | UID | Name | Start Date | End Date
1 1 Zac 1/1/2016 -
2 4 Joel 1/1/2016 -
3 4 Joel - 1/4/2016
4 2 John 1/5/2016 -
5 3 Sam 1/6/2016 -
6 4 Joel 1/7/2016 -
In the above situation you can see that Joel worked from the 1/1/2016 until the 1/4/2016, took 3 months off and then worked from the 1/7/2016.
I need to build a query where by I can pick a date in time and report on who was working at that time. The above table only lists the name but there will be many more columns to report on for a point in time.
What would be best way to structure the tables to be able to achieve this query.
I started writing this last night and finally coming back to it. Basically you would have to use your change table to create a Slowly Changing Dimension and then generate a row number to match your start and ends. This will assume however that your DB will never be out of sync by adding 2 start records or 2 end records in a row.
This also assumes you are using a RDBMS that supports common table expressions and Window Functions such as SQL Server, Oracle, PostgreSQL, DB2....
WITH cte AS (
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY UID ORDER BY ISNULL(StartDate,EndDate)) As RowNum
FROM
Changes c
)
SELECT
s.UID
,s.Name
,s.StartDate
,COALESCE(e.EndDate,GETDATE()) as EndDate
FROM
cte s
LEFT JOIN cte e
ON s.UID = e.UID
AND s.RowNum + 1 = e.RowNum
WHERE
s.StartDate IS NOT NULL
AND '2016-05-05' BETWEEN s.StartDate AND COALESCE(e.EndDate,GETDATE())

How to get the count of distinct values until a time period Impala/SQL?

I have a raw table recording customer ids coming to a store over a particular time period. Using Impala, I would like to calculate the number of distinct customer IDs coming to the store until each day. (e.g., on day 3, 5 distinct customers visited so far)
Here is a simple example of the raw table I have:
Day ID
1 1234
1 5631
1 1234
2 1234
2 4456
2 5631
3 3482
3 3452
3 1234
3 5631
3 1234
Here is what I would like to get:
Day Count(distinct ID) until that day
1 2
2 3
3 5
Is there way to easily do this in a single query?
Not 100% sure if will work on impala
But if you have a table days. Or if you have a way of create a derivated table on the fly on impala.
CREATE TABLE days ("DayC" int);
INSERT INTO days
("DayC")
VALUES (1), (2), (3);
OR
CREATE TABLE days AS
SELECT DISTINCT "Day"
FROM sales
You can use this query
SqlFiddleDemo in Postgresql
SELECT "DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN days
WHERE "Day" <= "DayC"
GROUP BY "DayC"
OUTPUT
| DayC | count |
|------|-------|
| 1 | 2 |
| 2 | 3 |
| 3 | 5 |
UPDATE VERSION
SELECT T."DayC", COUNT(DISTINCT "ID")
FROM sales
cross JOIN (SELECT DISTINCT "Day" as "DayC" FROM sales) T
WHERE "Day" <= T."DayC"
GROUP BY T."DayC"
try this one:
select day, count(distinct(id)) from yourtable group by day

MAX on group returns multiple values with same date but different times

I have followed many of the excellent pieces of advise on this site about selecting the MAX from a group of rows.
I have a history file and I only want the top date and comments for each project number. I am creating a derived table in a Boxi universe from this information. It all goes pretty well but if there are two entries for the same day but with different times they are both returned. This duplicates that entry on the subsequent report. Is there some way to make the MAX command go down to the time level of the date field?
Database is SQL Server 2005
-------------Sql used for derived table
Select
Projectno, Comment, CreatedOn
from
ReportHistory
Where
ReportHistory.ItemName=('ProjectCode1')
and
CreatedOn in(Select max(CreatedOn) FROM ReportHistory group by Projectno)
-------------------Example database
Projectno Comment Created on
1 Started 2013-01-04 11:04:00
2 Late 2013-01-06 11:22:00
3 Late 2013-01-07 11:06:00
1 On Time 2013-01-08 11:01:00 *these two both get selected*
1 Late 2013-01-08 12:05:00 *these two both get selected*
3 Back on schedule 2013-01-08 14:20:00
2 Still overdue 2013-01-09 09:01:00
MAX on a DATETIME data type do obviously take the time into account, that is not what's wrong with your query. The problem is that you are not ensuring that the max value for CreatedOn is for the correct ProjectNo. You could use analytical functions for this:
;WITH CTE AS
(
SELECT Projectno,
Comment,
CreatedOn,
ROW_NUMBER() OVER(PARTITION BY ProjectNo ORDER BY CreatedOn DESC) RN
FROM ReportHistory
WHERE ReportHistory.ItemName = 'ProjectCode1'
)
SELECT Projectno, Comment, CreatedOn
FROM CTE
WHERE RN = 1
Query if there are no same projectno with the same date:
SQLFIDDLEExample
SELECT h.Projectno,
h.Comment,
h.[Created on]
FROM ReportHistory h
WHERE h.[Created on] =(Select max(h2.[Created on])
FROM ReportHistory h2
WHERE h2.Projectno = h.Projectno )
ORDER BY h.Projectno
Result:
| PROJECTNO | COMMENT | CREATED ON |
-----------------------------------------------------------------
| 1 | Late | January, 08 2013 12:05:00+0000 |
| 2 | Still overdue | January, 09 2013 09:01:00+0000 |
| 3 | Back on schedule | January, 08 2013 14:20:00+0000 |
Query if there are same projectno with the same date:
SELECT h.Projectno,
MAX(h.Comment) AS Comment,
h.[Created on]
FROM ReportHistory h
WHERE h.[Created on] =(Select max(h2.[Created on])
FROM ReportHistory h2
WHERE h2.Projectno = h.Projectno )
GROUP BY h.Projectno,
h.[Created on]
ORDER BY h.Projectno
I think you receive copies when dates at different projects are identical.
For eg. add in your data (4, 'On Time', '2013-01-08 11:01:00')
Then result will be SQLFiddle
But you need this result SQLFiddle
SELECT *
FROM ReportHistory t
WHERE t.ItemName=('ProjectCode1')
AND EXISTS (
SELECT 1
FROM ReportHistory
WHERE projectNo = t.projectNo
GROUP BY projectNo
HAVING MAX(CreatedOn) = t.CreatedOn
)