date_entry time_start time_finished idle_code qty_good
8/8/2013 13:00 13:30 6 10
8/8/2013 13:30 15:20 0 20
8/8/2013 15:20 15:30 6 5
8/8/2013 15:30 16:25 0 10
8/8/2013 16:25 16:40 7 0
8/8/2013 16:40 17:25 0 40
8/8/2013 17:25 17:40 3 10
8/8/2013 17:40 24:00 1
8/8/2013 24:00 00:00 1
8/8/2013 00:00 00:30 1
Idle Time Legend:
0 Production
1 Adjustment/Mold
2 Machine
3 Quality Matter
4 Supply Matter
5 Mold Change
6 Replacer
7 Others
----------Result--------------------------------------
total mins
idle_code total mins
1 - 410:00 mins
2 - 00:00
3 - 15:00
4 - 00:00
5 - 00:00
6 - 40:00
7 - 15:00
0 - 210:00
First question how to group by idle_code and add the total mins.?
---------other report----------------------------------
production efficientcy report
idle_code total mins
1 410:00 mins
2 00:00 mins
3 15:00 mins
4 00:00 mins
5 00:00 mins
7 15:00 mins
total idle time = 440:00 mins (formula: sum(total mins of idle 1,2,3,4,5,7))
idle rate = 63.77% (formula: (total idle time / total actual production time)* 100 )
total operation time = 250:00 mins (formula sum(idl_code '0' and idle_code '6'))
machine efficienct = 36.23% (formula (total operation time / total actual production time * 100))
total actual production time = 690:00 mins (formula sum(total_idle_time + total operation time))
this is easy to compute in the powerbuilder using computed field but my problem is how to group them by idle_code and there total mins.
You could do this as a single SQL statement, summing the difference between the start and finish times, and grouping on idle_code. (Don't forget to make this a Left Outer Join from the Idle_Code table to the Production data table). This would save you from retrieving all the detail data to the client, and doing the grouping and summing there.
If you need to do this as a computed column, and you've retrieved all the detail data, then create a group on idle_code, and create a computed column that sums (time_finished - time_start for group 1). The SecondsAfter() function can do this, if those columns are datetimes and not just time values.
How are you storing your time_start and time_finished columns? Are those datetime datatypes? Because that makes the calculations much easier. If they're just times, you'll have problems calculating the duration when those times cross midnight into the next day.
Related
I have a pandas dataframe with many rows. In each row I have an object and the duration of the machining on a certain machine (with a start time and an end time). Each object can be processed in several machines in succession. I need to find the actual duration of all jobs.
For example:
Object
Machine
T start
T end
1
A
17:26
17:57
1
B
17:26
18:33
1
C
18:56
19:46
2
A
14:00
15:00
2
C
14:30
15:00
3
A
12:00
12:30
3
C
13:00
13:45
For object 1 the actual duration is 117 minutes,for object 2 is 60 minutes and for object 3 is 75 minutes.
I tried with a groupby where I calculated the sum of the durations of the processes for each object and the minimum and maximum values, i.e. the first start and the last end. Then I wrote a function that compares these values but it doesn't work in case of object 1, and it works for object 2 and 3.
Here my solution:
Object
min
max
sumT
LT_ACTUAL
1
17:26
19:46
148
140 ERROR!
2
14:00
15:00
90
60 OK!
3
12:00
13:45
75
75 OK!
def calc_lead_time(min_t_start, max_t_end, t_sum):
t_max_min = (max_t_end - min_t_start) / pd.Timedelta(minutes=1)
if t_max_min <= t_sum:
return t_max_min
else:
return t_sum
df['LT_ACTUAL'] = df.apply(lambda x : calc_lead_time(x['min'], x['max'], x['sumT']), axis=1)
I posted an image to explane all the cases. I need to calc the actual duration between the tasks
Assuming the data is sorted by start time, and that one task duration is not fully within another one, you can use:
start = pd.to_timedelta(df['T start']+':00')
end = pd.to_timedelta(df['T end']+':00')
s = start.groupby(df['Object']).shift(-1)
(end.mask(end.gt(s), s).sub(start)
.groupby(df['Object']).sum()
)
Output:
Object
1 0 days 01:57:00
2 0 days 01:00:00
3 0 days 01:15:00
dtype: timedelta64[ns]
For minutes:
start = pd.to_timedelta(df['T start']+':00')
end = pd.to_timedelta(df['T end']+':00')
s = start.groupby(df['Object']).shift(-1)
(end.mask(end.gt(s), s).sub(start)
.groupby(df['Object']).sum()
.dt.total_seconds().div(60)
)
Output:
Object
1 117.0
2 60.0
3 75.0
dtype: float64
handling overlapping intervals
See here for the logic of the overlapping intervals grouping.
(df.assign(
start=pd.to_timedelta(df['T start']+':00'),
end=pd.to_timedelta(df['T end']+':00'),
max_end=lambda d: d.groupby('Object')['end'].cummax(),
group=lambda d: d['start'].ge(d.groupby('Object')['max_end'].shift()).cumsum()
)
.groupby(['Object', 'group'])
.apply(lambda g: g['end'].max()-g['start'].min())
.groupby(level='Object').sum()
.dt.total_seconds().div(60)
)
Output:
Object
1 117.0
2 60.0
3 75.0
4 35.0
dtype: float64
Used input:
Object Machine T start T end
0 1 A 17:26 17:57
1 1 B 17:26 18:33
2 1 C 18:56 19:46
3 2 A 14:00 15:00
4 2 C 14:30 15:00
5 3 A 12:00 12:30
6 3 C 13:00 13:45
7 4 A 12:00 12:30
8 4 C 12:00 12:15
9 4 D 12:20 12:35
def function1(dd:pd.DataFrame):
col1=dd.apply(lambda ss:pd.date_range(ss["T start"]+pd.to_timedelta("1 min"),ss["T end"],freq="min"),axis=1).explode()
min=col1.min()-pd.to_timedelta("1 min")
max=col1.max()
sumT=col1.size
LT_ACTUAL=col1.drop_duplicates().size
return pd.DataFrame({"min":min.strftime('%H:%M'),"max":max.strftime('%H:%M'),"sumT":sumT,"LT_ACTUAL":LT_ACTUAL,},index=[dd.name])
df1.groupby('Object').apply(function1).droplevel(0)
out:
min max sumT LT_ACTUAL
1 17:26 19:46 148 117
2 14:00 15:00 90 60
3 12:00 13:45 75 75
Is there a way to transpose/flatten the following table -
userId
time window
propertyId
count
sum
avg
max
1
01:00 - 02:00
a
2
5
1.5
3
1
02:00 - 03:00
a
4
15
2.5
6
1
01:00 - 02:00
b
2
5
1.5
3
1
02:00 - 03:00
b
4
15
2.5
6
2
01:00 - 02:00
a
2
5
1.5
3
2
02:00 - 03:00
a
4
15
2.5
6
2
01:00 - 02:00
b
2
5
1.5
3
2
02:00 - 03:00
b
4
15
2.5
6
to something like this -
userId
time window
a_count
a_sum
a_avg
a_max
b_count
b_sum
b_avg
b_max
1
01:00 - 02:00
2
5
1.5
3
2
5
1.5
3
1
02:00 - 03:00
4
15
2.5
6
4
15
2.5
6
2
01:00 - 02:00
2
5
1.5
3
2
5
1.5
3
2
02:00 - 03:00
4
15
2.5
6
4
15
2.5
6
Basically, I want to flatten the table by having the aggregation columns (count, sum, avg, max) per propertyId, so the new columns are a_count, a_sum, a_avg, a_max, b_count, b_sum, ... All the rows have these values per userId per time window.
Important clarification: The values in propertyId column can change and hence, the number of columns can change as well. So, if there are n different values for propertyId, then there will be n*4 aggregation columns created.
SQL does not allow a dynamic number of result columns on principal. It demands to know number and data types of resulting columns at call time. The only way to make it "dynamic" is a two-step process:
Generate the query.
Execute it.
If you don't actually need separate columns, returning arrays or document-type columns (json, jsonb, xml, hstore, ...) containing a variable number of data sets would be a feasible alternative.
See:
Execute a dynamic crosstab query
I have a few different footfall sensors across some stores.
Some sensors record footfall every 15min, others every 30min, and others every 60min. Not all stores have all sensors, and some data readings are missing.
The readings of the 60min sensor are the most accurate, followed by the 30min sensor, followed by the 15min.
I have this in a SQL table of the form:
sensorId
storeId
sensorType
readingTimeStamp
value
1
1
15m
2022-06-01 10:45
99
1
1
15m
2022-06-01 11:00
51
1
1
15m
2022-06-01 11:15
19
1
1
15m
2022-06-01 11:30
12
2
1
30m
2022-06-01 11:00
86
2
1
30m
2022-06-01 11:30
89
3
1
60m
2022-06-01 11:00
115
The task is to calculate the footfall in the last 60min every 15 minutes.
The logic is as follows:
If the last 60min are available from a 60min sensor use that
Else try to use the sum of two 30min sensors
Else try to use a 30min and two 15min sensors
Else try to use four of the 15min sensors
If we are not "on-the-hour" then build the reading starting from the last hourly measure
For example, at 11:45 the "best" way to calculate this is:
60min reading at 11:00 + 30min reading at 11:30 + 15min reading at 11:45 - 30min reading at 10:30 - 15min reading at 10:45
I have some Python code that does this fairly well. But any ideas on how to implement it in SQL?
*PS: this calculation is a business requirement. Also, it can't assume that two 15min readings add up to the 30min reading, etc...
We currently have a master table stored in our SQL server with the following example information:
Site
Shift Num
Start Time
End Time
Daily Target
A
1
8:00AM
4:00PM
10000
B
1
7:00AM
3:00PM
12000
B
2
4:00PM
2:00AM
7000
C
1
6:00AM
2:00PM
5000
As you can see, there are multiples sites each with their own respective shift start & end times as well as a total daily target for the day.
Another table in the DB is populated by users via the use of a PowerApp. This PowerApp will push output values to the server like so:
Site
Shift Number
Output
Timestamp
A
1
2500
3/15/2022 9:45 AM
A
1
4200
3/15/2022 11:15 AM
A
1
5600
3/15/2022 12:37 PM
A
1
7500
3/15/2022 2:15 PM
This table contains a log of all time-stamped output entries for each site / shift.
What I would like to do is do a daily trend of output vs. target. In order to do so, all output values over a specific shift would have to be aggregated in a SUM function for a given shift grouped by the shift day. The resulting view would need to look like this:
Site
Shift Number
Day
Actual
Target
A
1
3/14
9500
10000
B
1
3/14
13000
12000
A
1
3/15
8000
10000
B
1
3/15
10000
12000
This is easy enough for daytime shifts (group by day and sum the output values). However, if you notice in the master table, Site B / Shift 2 crosses midnight. In this example, I would need to sum values from the previous day 4PM up until 2AM of today. The date grouping would be done by the Shift End Time. Here's an example of the problem area:
Site
Shift Number
Output
Timestamp
B
2
3300
3/15/2022 5:45 PM
B
2
2200
3/15/2022 8:15 PM
B
2
1600
3/16/2022 12:37 AM
B
2
2500
3/16/2022 1:15 AM
I would need these four rows to be aggregated in the view as one row like so:
Site
Shift Number
Day
Actual
Target
B
2
3/16
9600
10000
The values should be listed under March 16th since the end time of the shift occurs then. The values are summated and the target is taken from the daily target master table.
How can I properly calculate these outputs for each shift every day irrespective if it crosses into a new day or not in a view? Or should I go a different route altogether?
I have a bunch of timestamps grouped by ID and type in the sample data shown below.
I would like to find overlapped time between start_time and end_time columns in seconds for each group of ID and between each lead and follower combinations. I would like to show the overlap time only for the first record of each group which will always be the "lead" type.
For example, for the ID 1, the follower's start and end times in row 3 overlap with the lead's in row 1 for 193 seconds (from 09:00:00 to 09:03:13). the follower's times in row 3 also overlap with the lead's in row 2 for 133 seconds (09:01:00 to 2020-05-07 09:03:13). That's a total of 326 seconds (193+133)
I used the partition clause to rank rows by ID and type and order them by start_time as a start.
How do I get the overlap column?
row# ID type start_time end_time rank. overlap
1 1 lead 2020-05-07 09:00:00 2020-05-07 09:03:34 1 326
2 1 lead 2020-05-07 09:01:00 2020-05-07 09:03:13 2
3 1 follower 2020-05-07 08:59:00 2020-05-07 09:03:13 1
4 2 lead 2020-05-07 11:23:00 2020-05-07 11:33:00 1 540
4 2 follower 2020-05-07 11:27:00 2020-05-07 11:32:00 1
5 3 lead 2020-05-07 14:45:00 2020-05-07 15:00:00 1 305
6 3 follower 2020-05-07 14:44:00 2020-05-07 14:44:45 1
7 3 follower 2020-05-07 14:50:00 2020-05-07 14:55:05 2
In your example, the times completely cover the total duration. If this is always true, you can use the following logic:
select id,
(sum(datediff(second, start_time, end_time) -
datediff(second, min(start_time), max(end_time)
) as overlap
from t
group by id;
To add this as an additional column, then either use window functions or join in the result from the above query.
If the overall time has gaps, then the problem is quite a bit more complicated. I would suggest that you ask a new question and set up a db fiddle for the problem.
Tried this a couple of way and got it to work.
I first joined 2 tables with individual records for each type, 'lead' and 'follower' and created a case statement to calculate max start time for each lead and follower start time combination and min end time for each lead and follower end time combination. Stored this in a temp table.
CASE
WHEN lead_table.start_time > follower_table.start_time THEN lead_table.start_time
WHEN lead_table.start_time < follower_table.start_time THEN patient_table.start_time_local
ELSE 0
END as overlap_start_time,
CASE
WHEN follower_table.end_time < lead_table.end_time THEN follower_table.end_time
WHEN follower_table.end_time > lead_table.end_time THEN lead_table.end_time
ELSE 0
END as overlap_end_time
Then created an outer query to lookup the temp table just created to find the difference between start time and end time for each lead and follower combination in seconds
select temp_table.id,
temp_table.overlap_start_time,
temp_table.overlap_end_time,
DATEDIFF_BIG(second,
temp_table.overlap_start_time,
temp_table.overlap_end_time) as overlap_time FROM temp_table