sum time with specific delimiter - sql

Right now I have a problem with sum time based on specific condition. For example, I have something like this.
Due to some reason, I have to add the work time based on their activity date if only approval status on the activity date is approve.
So for the restriction example I have something like this
-----------------------------------------------
| Activity Date | ApprovalStatus | WorkTime |
-----------------------------------------------
| 2017-01-06 | Rejected | 01:00:00 |
-----------------------------------------------
| 2017-01-06 | Approve | 03:00:00 |
-----------------------------------------------
| 2017-01-06 | Waiting | 02:00:00 |
-----------------------------------------------
| 2017-01-06 | Approve | 01:00:00 |
-----------------------------------------------
From those example, the accepted worktime that only will be summed from this circumstances, So the expected result is become like below. The expected result is become 04:00:00 since only the approve counted for final result.
-----------------------------------------------
| Activity Date | ApprovalStatus | WorkTime |
-----------------------------------------------
| 2017-01-06 | Approved | 04:00:00 |
-----------------------------------------------
Is there any enlightenment to solve this problem?
PS: I am using SQL Server 2014. Hope you can help me, thank you!!

Try like below
Schema:
SELECT * INTO #TAB FROM(
SELECT '2017-01-06' AS Activity_Date
, 'Rejected' AS ApprovalStatus
, '01:00:00' AS WorkTime
UNION ALL
SELECT '2017-01-06' , 'Approve' , '03:00:00'
UNION ALL
SELECT '2017-01-06' , 'Waiting' , '02:00:00'
UNION ALL
SELECT '2017-01-06' , 'Approve' , '01:00:00'
)A
Now Sum the Hours column by grouping the Date
SELECT [Activity_Date]
,CAST(DATEADD(HH,SUM( DATEDIFF(HH,'00:00:00',WorkTime)),'00:00:00') AS TIME(0))
FROM #TAB
WHERE ApprovalStatus='Approve'
GROUP BY [Activity_Date]
Result:
+---------------+------------------+
| Activity_Date | (No column name) |
+---------------+------------------+
| 2017-01-06 | 04:00:00 |
+---------------+------------------+
UPDATE :
The SUM function will only take exact numeric or approximate numeric data type . It won't accept date or Time datatype for summation.
It is documented in SUM (Transact-SQL) on microsoft website.
SUM ( [ ALL | DISTINCT ] expression )
expression
Is a constant, column, or function, and any combination of
arithmetic, bitwise, and string operators. expression is an expression
of the exact numeric or approximate numeric data type category, except
for the bit data type. Aggregate functions and subqueries are not
permitted.
So you can only have a chance to write your own logic to get the sum of Time. This below will calculate the SUM of time upto milliseconds.
SELECT [Activity_Date]
,CAST(DATEADD(ms, SUM(DATEDIFF(ms, '00:00:00.000', WorkTime)), '00:00:00.000') as time(0))
FROM #TAB2
WHERE ApprovalStatus='Approve'
GROUP BY [Activity_Date]

You can filter the records by ApprovalStatus and do a summation on worktime by grouping it by activity date.
Use this, if you want to add only the hour part.
SELECT SUM(DATEDIFF(HH,'00:00:00',WorkTime)) AS [TotalWorktime]
FROM [YourTable]
WHERE ApprovalStatus = 'Approve'
GROUP BY [Activity Date]
OR
Use this if you want to add even the minutes part.
SELECT SUM(DATEDIFF(MINUTE,'0:00:00',CONVERT(TIME,WorkTime)))/60 + (SUM(DATEDIFF(MINUTE,'0:00:00',CONVERT(TIME,WorkTime)))%60)/100.0 AS [TotalWorktime]
FROM [YourTable]
WHERE ApprovalStatus = 'Approve'
GROUP BY [Activity Date]

Related

Querying the retention rate on multiple days with SQL

Given a simple data model that consists of a user table and a check_in table with a date field, I want to calculate the retention date of my users. So for example, for all users with one or more check ins, I want the percentage of users who did a check in on their 2nd day, on their 3rd day and so on.
My SQL skills are pretty basic as it's not a tool that I use that often in my day-to-day work, and I know that this is beyond the types of queries I am used to. I've been looking into pivot tables to achieve this but I am unsure if this is the correct path.
Edit:
The user table does not have a registration date. One can assume it only contains the ID for this example.
Here is some sample data for the check_in table:
| user_id | date |
=====================================
| 1 | 2020-09-02 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 12:00:00 |
-------------------------------------
| 1 | 2020-09-04 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 11:00:00 |
-------------------------------------
| ... |
-------------------------------------
And the expected output of the query would be something like this:
| day_0 | day_1 | day_2 | day_3 |
=================================
| 70% | 67 % | 44% | 32% |
---------------------------------
Please note that I've used random numbers for this output just to illustrate the format.
Oh, I see. Assuming you mean days between checkins for users -- and users might have none -- then just use aggregation and window functions:
select sum( (ci.date = ci.min_date)::numeric ) / u.num_users as day_0,
sum( (ci.date = ci.min_date + interval '1 day')::numeric ) / u.num_users as day_1,
sum( (ci.date = ci.min_date + interval '2 day')::numeric ) / u.num_users as day_2
from (select u.*, count(*) over () as num_users
from users u
) u left join
(select ci.user_id, ci.date::date as date,
min(min(date::date)) over (partition by user_id order by date) as min_date
from checkins ci
group by user_id, ci.date::date
) ci;
Note that this aggregates the checkins table by user id and date. This ensures that there is only one row per date.

SQL grouping by datetime with a maximum difference of x minutes

I have a problem with grouping my dataset in MS SQL Server.
My table looks like
# | CustomerID | SalesDate | Turnover
---| ---------- | ------------------- | ---------
1 | 1 | 2016-08-09 12:15:00 | 22.50
2 | 1 | 2016-08-09 12:17:00 | 10.00
3 | 1 | 2016-08-09 12:58:00 | 12.00
4 | 1 | 2016-08-09 13:01:00 | 55.00
5 | 1 | 2016-08-09 23:59:00 | 10.00
6 | 1 | 2016-08-10 00:02:00 | 5.00
Now I want to group the rows where the SalesDate difference to the next row is of a maximum of 5 minutes.
So that row 1 & 2, 3 & 4 and 5 & 6 are each one group.
My approach was getting the minutes with the DATEPART() function and divide the result by 5:
(DATEPART(MINUTE, SalesDate) / 5)
For row 1 and 2 the result would be 3 and grouping here would work perfectly.
But for the other rows where there is a change in the hour or even in the day part of the SalesDate, the result cannot be used for grouping.
So this is where I'm stuck. I would really appreciate, if someone could point me in the right direction.
You want to group adjacent transactions based on the timing between them. The idea is to assign some sort of grouping identifier, and then use that for aggregation.
Here is an approach:
Identify group starts using lag() and date arithmetic.
Do a cumulative sum of the group starts to identify each group.
Aggregate
The query looks like this:
select customerid, min(salesdate), max(saledate), sum(turnover)
from (select t.*,
sum(case when salesdate > dateadd(minute, 5, prev_salesdate)
then 1 else 0
end) over (partition by customerid order by salesdate) as grp
from (select t.*,
lag(salesdate) over (partition by customerid order by salesdate) as prev_salesdate
from t
) t
) t
group by customerid, grp;
EDIT
Thanks to #JoeFarrell for pointing out I have answered the wrong question. The OP is looking for dynamic time differences between rows, but this approach creates fixed boundaries.
Original Answer
You could create a time table. This is a table that contains one record for each second of the day. Your table would have a second column that you can use to perform group bys on.
CREATE TABLE [Time]
(
TimeId TIME(0) PRIMARY KEY,
TimeGroup TIME
)
;
-- You could use a loop here instead.
INSERT INTO [Time]
(
TimeId,
TimeGroup
)
VALUES
('00:00:00', '00:00:00'), -- First group starts here.
('00:00:01', '00:00:00'),
('00:00:02', '00:00:00'),
('00:00:03', '00:00:00'),
...
('00:04:59', '00:00:00'),
('00:05:00', '00:05:00'), -- Second group starts here.
('00:05:01', '00:05:00')
;
The approach works best when:
You need to reuse your custom grouping in several different queries.
You have two or more custom groups you often use.
Once populated you can simply join to the table and output the desired result.
/* Using the time table.
*/
SELECT
t.TimeGroup,
SUM(Turnover) AS SumOfTurnover
FROM
Sales AS s
INNER JOIN [Time] AS t ON t.TimeId = CAST(s.SalesDate AS Time(0))
GROUP BY
t.TimeGroup
;

How to sort PostgreSQL data by weeks over month period

In simplest terms, I want to pull aggregate data from a table over a 4 week period but group by each week. It is safe to assume we can "force" a specific date or time (although it would be nice to allow any date entered and have the query run based on the date entered).
For example, the resulting data from a query would look like this:
start_date | end_date | count_of_sales
---------------------------------------------------------------
2014-03-03 04:00:00 | 2014-03-10 03:59:59 | 375
2014-03-10 04:00:00 | 2014-03-17 03:59:59 | 375
2014-03-17 04:00:00 | 2014-03-24 03:59:59 | 375
2014-03-24 04:00:00 | 2014-03-31 04:00:00 | 200
This would stem from unaggregated data that simply had a date (and of course other data but that is irrelevant):
saleDate | repID | productID
---------------------------------------------------------------
2014-03-04 12:36:33 | 1235 | 443
2014-03-09 07:08:12 | 1235 | 493
2014-03-09 10:12:44 | 3948 | 472
2014-03-21 23:33:01 | 2957 | 479
In my head the query would look SOMETHING (although accurate) like this:
SELECT start_date, end_date, COUNT(*) FROM table WHERE date < '2014-03-31 04:00:00' GROUP BY date
I understand the query above however does not understand how far back to look (ideally the customer enters the final date and perhaps how many weeks prior of data they want to pull) which is why I left out a date BETWEEN clause (they may not know the exact 'start' date.
Sorry if this is confusing but hopefully the sample SQL (albeit wrong) and desired results will give a clearer picture
If I got your question correctly, then following code should help you,
For clarification: Code which I have given is of SQL Server.
With CTE as
(
Select 1 as pID,'2014-03-03 04:00:00' as startDate,'2014-03-10 03:59:59' as endDate
Union All
Select 2,'2014-03-10 04:00:00','2014-03-17 03:59:59'
Union All
Select 3,'2014-03-17 04:00:00','2014-03-24 03:59:59'
Union All
Select 4,'2014-03-24 04:00:00','2014-03-31 04:00:00'
)
select a.pID,a.startDate,a.endDate,count(*) from CTE as a
inner join MyTable on myDateCol between a.startDate and a.endDate
group by a.pID,a.startDate,a.endDate
for demo SQL Fiddle

Joining two rows with same date/time

I've got a working query which is fine for now - and just about does what I'm looking for. I'm wanting to consult, however, as to whether this is the most sensible way of manipulating my data to have it spit out what I need:
I've got a table REPORTS which stores report data. One row gets inserted when a report is run, and another when a report is confirmed. Confirming a report simply involves inserting a reserved name TRUE with the same date as the report to be confirmed. Ugly, yes. But unfortunately, it's not up to me to decide...
Table structure:
Reports
UID (char)
Report (char)
Date (date)
On having run a report, the table REPORTS might look a little like this:
+------+--------+---------------------+
| UID | Report | Date |
+------+--------+---------------------+
| 0001 | runX | 2014-01-02 03:04:59 |
| 0001 | runY | 2014-01-02 03:05:58 |
| 0001 | runX | 2014-01-02 03:06:20 |
+------+--------+---------------------+
On action 'report confirm', the following rows would be inserted:
+------+--------+---------------------+
| UID | Report | Date |
+------+--------+---------------------+
| 0001 | TRUE | 2014-01-02 03:04:59 |
| 0001 | TRUE | 2014-01-02 03:05:58 |
| 0001 | TRUE | 2014-01-02 03:06:20 |
+------+--------+---------------------+
As you can see, when a report is marked TRUE (ie correct), there are two rows with exactly the same DATE:
+------+--------+---------------------+
| UID | Report | Date |
+------+--------+---------------------+
| 0001 | runX | 2014-01-02 03:04:59 |
| 0001 | TRUE | 2014-01-02 03:04:59 |
| 0001 | runY | 2014-01-02 03:05:58 |
| 0001 | TRUE | 2014-01-02 03:05:58 |
| 0001 | runX | 2014-01-02 03:06:20 |
| 0001 | TRUE | 2014-01-02 03:06:20 |
+------+--------+---------------------+
To return all reports which are 'correct' ie TRUE and identical date/time to report name eg 'runX', I do the following:
SELECT * FROM REPORTS T1
LEFT JOIN REPORTS T2
ON T1.DATE = T2.DATE
WHERE T1.REPORT = 'TRUE'
AND T1.REPORT != T2.REPORT;
This gives me something I can at least work with. I know, however, that there must be a more elegant way of doing this? The last clause, for example: not putting that in has it spit out a cartesian product, meaning I've created a cartesian product and am then filtering it. Presumably there must be a way of avoiding it completely and not creating it in the first place?
If I understand correctly, you want to pull the name from the record at the same time as the TRUE record and only return reports that actually have a TRUE record:
select uid,
max(case when Report <> 'TRUE' then Report end) as Report,
date
from reports r
group by uid, date
having sum(case when Report = 'TRUE' then 1 else 0 end) > 0;
Note: Doing equality comparisons on dates with a time component seems dangerous. The process that creates these tables should be putting some other link to the right report in the record. For instance, it could update a check flag column rather than create a new row.
EDIT:
Why is joining on dates (with times) as bad idea? Often, dates are shown as only dates, without the time component. That means that two dates can look the same in output, but really be different. Or, two dates can be in different time zones and look different but be the same.
Oracle mitigates the first problem by storing dates up to the second, in an exact format. Two dates that look the same to the second are the same. Equivalent data types in other databases sometimes include milliseconds -- although these are rarely printed out with the value. Two dates with times up to the second can look the same and still be different. In Oracle, you could say that two dates with times up to the minute can look the same and still be different.
The same phenomenon happens with floating point data types -- 1.0000000 and 0.9999999 are different, but they look the same when shown as 1.000. A join on these values would fail, even though looking at the values would suggest that it would succeed.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE reports ( "UID", Report, "Date" ) AS
SELECT '0001', 'runX', TO_DATE( '2014-01-02 03:04:59', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'TRUE', TO_DATE( '2014-01-02 03:04:59', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'runY', TO_DATE( '2014-01-02 03:05:58', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'TRUE', TO_DATE( '2014-01-02 03:05:58', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'runX', TO_DATE( '2014-01-02 03:06:20', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL
UNION ALL SELECT '0001', 'TRUE', TO_DATE( '2014-01-02 03:06:20', 'yyyy-mm-dd hh24:mi:ss' ) FROM DUAL;
Query 1:
SELECT "UID",
MAX( CASE Report WHEN 'TRUE' THEN NULL ELSE Report END ) AS Report,
"Date"
FROM reports
GROUP BY "UID", "Date"
HAVING MAX( CASE Report WHEN 'TRUE' THEN 1 ELSE 0 END ) = 1
Results:
| UID | REPORT | DATE |
|------|--------|--------------------------------|
| 0001 | runX | January, 02 2014 03:04:59+0000 |
| 0001 | runY | January, 02 2014 03:05:58+0000 |
| 0001 | runX | January, 02 2014 03:06:20+0000 |
Query 2:
Assuming that when a report is marked as FALSE if it is not correct then you could do:
SELECT "UID",
Report,
"Date"
FROM ( SELECT "UID",
Report,
LEAD( Report )
OVER (
PARTITION BY "UID", "Date"
ORDER BY CASE Report
WHEN 'TRUE' THEN 2
WHEN 'FALSE' THEN 1
ELSE 0
END ) AS Result,
"Date"
FROM Reports )
WHERE Result = 'TRUE'
Results:
| UID | REPORT | DATE |
|------|--------|--------------------------------|
| 0001 | runX | January, 02 2014 03:04:59+0000 |
| 0001 | runY | January, 02 2014 03:05:58+0000 |
| 0001 | runX | January, 02 2014 03:06:20+0000 |

SQL: earliest date from set of date fields

I have a series of dates associated with a unique identifier in a table. For example:
1 | 1999-04-01 | 0000-00-00 | 0000-00-00 | 0000-00-00 | 2008-12-01 |
2 | 1999-04-06 | 2000-04-01 | 0000-00-00 | 0000-00-00 | 2010-04-03 |
3 | 1999-01-09 | 0000-00-00 | 0000-00-00 | 0000-00-00 | 2007-09-03 |
4 | 1999-01-01 | 0000-00-00 | 1997-01-01 | 0000-00-00 | 2002-01-04 |
Is there a way, to select the earliest date from the predefined list of DATE fields using a straightforward SQL command?
So the expected output would be:
1 | 1999-04-01
2 | 1999-04-06
3 | 1998-01-09
4 | 1997-01-01
I am guessing this is not possible but I wanted to ask and make sure. My current solution in mind involves putting all the dates in a temporary table and then using that to get the MIN()
thanks
Edit: The problem with using LEAST() as stated is that the new behaviour is to return NULL if any of the columns in NULL. In a series of dates like the dataset in question, any date might be NULL. I would like to obtain the earliest actual date from the set of dates.
SOLUTION: Used a combination of LEAST() and IF() in order to filter out NULL dates.
SELECT LEAST( IF(date1=0,NOW(),date1), IF(date2=0,NOW(),date2), [...] );
Lessons learnt a) COALESCE does not treat '0000-00-00' as a NULL date, b) LEAST will return '0000-00-00' as the smallest value - I would guess this is due to internal integer comparison(?)
select id, least(date_col_a, date_col_b, date_col_c) from table
upd
select id, least (
case when date_col_a = '0000-00-00' then now() + interval 100 year else date_col_a end,
case when date_col_b = '0000-00-00' then now() + interval 100 year else date_col_b end) from table
Actually you can do it like bellow or using a large case structure... or with least(date1, date2, dateN) but with that null could be the minimum value...
select rowid, min(date)
from
( select rowid, date1 from table
union all
select rowid, date2 from table
union all
select rowid, date3 from table
/* and so on */
)
group by rowid;
HTH
select
id,
least(coalesce(date1, '9999-12-31'), ....)
from
table