Querying Timedelta df column

Querying Timedelta df column - pandas

#return rows that are <= 1 hour
df = df[df['timedeltas'].dt.total_seconds() < 3600]
After this query I am left with rows that are Timedelta(-1 days, +23:00:00)
And even when I compare pd.Timedelta(days=-1,hours=23) <= pd.Timedelta(hours=1) I get True. How come?
I am looking for a way to exclude any rows that are => 1 hour (01:00:00). Thanks

They indeed have -3600 seconds, it is kind of strange but makes sense depending on where is your 0, note that pd.Timedelta(days=-1,hours=23) is the same as pd.Timedelta(hours=-1) so if you have your origin in midnight from 01/04/2020 to 02/04/2020 this pd.Timedelta(days=-1,hours=23) would mean 23:00 01/04/2020, if you want to remove them, do like this:
df = df[(df['timedeltas'].dt.total_seconds() < 3600) & (df['timedeltas'].dt.total_seconds() >= 0)]

Related

Retrieve data 60 days prior to their retest date

I have a requirement where I need to retrieve Row(s) 60 days prior to their "Retest Date" which is a column present in the table. I have also attached the screenshot and the field "Retest Date" is highlighted.
reagentlotid
reagentlotdesc
u_retest
RL-0000004
NULL
2021-09-30 17:00:00.00
RL-0000005
NULL
2021-09-29 04:21:00.00
RL-0000006
NULL
2021-09-29 04:22:00.00
RL-0000007
Y-T4
2021-08-28 05:56:00.00
RL-0000008
NULL
2021-09-30 05:56:00.00
RL-0000009
NULL
2021-09-28 04:23:00.00
This is what I was trying to do in SQL Server:
select r.reagentlotid, r.reagentlotdesc, r.u_retestdt
from reagentlot r
where u_retestdt = DATEADD(DD,60,GETDATE());
But, it didn't work. The above query returning 0 rows.
Could please someone help me with this query?

Use a range, if you want all data from the day 60 days hence:
select r.reagentlotid, r.reagentlotdesc, r.u_retestdt
from reagentlot r
where
u_retestdt >= CAST(DATEADD(DD,60,GETDATE())
AS DATE) AND
u_retestdt < CAST(DATEADD(DD,61,GETDATE()) AS DATE)
Dates are like numbers; the time is like a decimal part. 12:00:00 is half way through a day so it's like x.5 - SQLServer even lets you manipulate datetime types by adding fractions of days etc (adding 0.5 is adding 12h)
If you had a column of numbers like 1.1, 1.5. 2.4 and you want all the one-point-somethings you can't get any of them by saying score = 1; you say score >= 1 and score < 2
Generally, you should try to avoid manipulating table data in a query's WHERE clause because it usually makes indexes unusable: if you want "all numbers between 1 and 2", use a range; don't chop the decimal off the table data in order to compare it to 1. Same with dates; don't chop the time off - use a range:
--yes
WHERE score >= 1 and score < 2
--no
WHERE CAST(score as INTEGER) = 1
--yes
WHERE birthdatetime >= '1970-01-01' and birthdatetime < '1970-01-02'
--no
WHERE CAST(birthdatetime as DATE) = '1970-01-01'
Note that I am using a CAST to cut the time off in my recommendation to you, but that's to establish a pair of constants of "midnight on the day 60 days in the future" and "midnight on 61 days in the future" that will be used in the range check.
Follow the rule of thumb of "avoid calling functions on columns in a where clause" and generally, you'll be fine :)

Try something like this. -60 days may be the current or previous year. HTH
;with doy1 as (
select DATENAME(dayofyear, dateadd(day,-60,GetDate())) as doy
)
, doy2 as (
select case when doy > 0 then doy
when doy < 0 then 365 - doy end as doy
, case when doy > 0 then year(getdate())
when doy < 0 then year(getdate())-1 end as yr
from doy1
)
select r.reagentlotid
, r.reagentlotdesc
, cast(r.u_retestdt as date) as u_retestdt
from reagentlot r
inner join doy2 d on DATENAME(dayofyear, r.u_retestdt) = d.doy
where DATENAME(dayofyear, r.u_retestdt) = doy
and year(r.u_retestdt) = d.yr

pandas convert delta time column into one sole unit time

I am working with a pandas dataframe where i have a deltatime column with values like:
{'deltatime': 0 days 09:06:30 , 0 days 00:30:34, 2 days 23:07:14 }
How can I convert those times into a single unit, like minutes, or hours but a single one in order to better visualize those times in a graph.
Some idea?
here:
https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html
it does not clarify how to simple change units

You can use the total_seconds() method of a timedelta to get its duration in seconds:
seconds = [t.total_seconds() for t in df['deltatime']]
And then convert the units if desired:
#to hours, i.e. 3600 seconds
hours = [t.total_seconds()/3600 for t in df['deltatime']]

You can do:
df['deltatime'] = pd.to_timedelta(df['deltatime']).dt.total_seconds()
Output:
deltatime
0 32790.0
1 1834.0
2 256034.0
You can also perform arithmetic operations on timedelta:
# convert to hours
pd.to_timedelta(df['deltatime']) / pd.to_timedelta('1H')
Output:
0 9.108333
1 0.509444
2 71.120556
Name: deltatime, dtype: float64

compare date types using between

below is a query I am running to get all the accounts with a locate date not between the date of the transaction and transaction date - 60. When I run it, the query returns this rows that are incorrect. When looking into this problem I made sure all dates are of the same time (they are all defined as date, not timestamp).
Edit: I have also tried putting the dates in trunc() and to_date() to no avail.
Here are the values of the dates that I am receiving:
skip_locate :22-AUG-13
transaction_date :30-AUG-13
transaction_date - 60 :01-JUL-13
EDIT 2: For those wondering about the dates, and if they are really from 2013:
skip_locate :2013-08-22 00:00:00
transaction_date :2013-08-30 00:00:00
transaction_date - 60 :2013-07-01 00:00:00
Also as I was playing around, when I take away the NOT in the NOT BETWEEN I get no results. This is wrong due to the fact that skip_locate is in fact between the two dates.
Here is the query:
SELECT DISTINCT rl.complaint_date,
rl.complaint_amt,
rl.date_served1,
rl.date_served2,
rl.judgement_date,
rl.skip_locate,
lcc.bal_range_min,
lcc.bal_range_max,
lcc.cost_range_min,
lcc.cost_range_max,
lcc.court,
ah.ACCOUNT,
ah.transaction_code,
ah.transaction_date,
ah.transaction_date - 60 "t - 60",
ah.rule_id,
ah.amount,
ah.description,
r.state,
r.zip_code,
z.county
FROM racctrel r,
ziplist z,
legal_court_cost lcc,
racctlgl rl,
legal_transaction_review ah
WHERE substr(r.zip_code,1,5) = z.zip
AND r.state = lcc.state
AND REPLACE(lcc.county,' ','') = REPLACE(upper(z.county),' ','')
AND r.ACCOUNT = rl.ACCOUNT
AND r.ACCOUNT = ah.ACCOUNT
AND lcc.transaction_code = ah.transaction_code
AND lcc.transaction_code in (2,31)
AND lcc.end_date IS NULL
AND ah.batch_id = 257
and rl.skip_locate not between ah.transaction_date and ah.transaction_date - 60;

In a BETWEEN predicate you place the earliest value first and the latest one second, so the code should be:
... BETWEEN ah.transaction_date - 60 and ah.transaction_date
If you had two dates and were not sure which was earliest and which latest, you would:
... BETWEEN Least(date_1, date_2) and Greatest(date_1, date_2)

How to get a SUM of a DATEDIFF but provide cut-off at 24 hours IF a single day is specified

This is actually my first question on stackoverflow, so I sincerely apologize if I am confusing or unclear.
That being said, here is my issue:
I work at a car manufacturing company and we have recently implemented the ability to track when our machines are idle. This is done by assessing the start and end time of the event called "idle_start."
Right now, I am trying to get the SUM of how long a machine is idle. Now, I figured this out BUT, some of the idle_times are LONGER than 24 hours.
So, when I specify that I only want to see the idle_time sums of ONE particular day, the sum is also counting the idle time past 24 hours.
I want to provide the option of CUTTING OFF at that 24 hours. Is this possible?
Here is the query:
{code}
SELECT r.`name` 'Producer'
, m.`name` 'Manufacturer'
-- , timediff(re.time_end, re.time_start) 'Idle Time Length'
, SEC_TO_TIME(SUM((TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start))))) 'Total Time'
, (SUM((TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start)))))/3600 'Total Time in Hours'
, (((SUM((TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start)))))/3600))/((IF(r.resource_status_id = 2, COUNT(r.resource_id), NULL))*24) 'Percent Machine is Idle divided by Machine Hours'
FROM resource_event re
JOIN resource_event_type ret
ON re.resource_event_type_id = ret.resource_event_type_id
JOIN resource_event_type reep
ON ret.parent_resource_event_type_id = reep.resource_event_type_id
JOIN resource r
ON r.`resource_id` = re.`resource_id`
JOIN manufacturer m
ON m.`manufacturer_id` = r.`manufacturer_id`
WHERE re.`resource_event_type_id` = 19
AND ret.`parent_resource_event_type_id` = 3
AND DATE_FORMAT(re.time_start, '%Y-%m-%d') >= '2013-08-12'
AND DATE_FORMAT(re.time_start, '%Y-%m-%d') <= '2013-08-18'
-- AND re.`resource_id` = 8
AND "Idle Time Length" IS NOT NULL
AND r.manufacturer_id = 13
AND r.resource_status_id = 2
GROUP BY 1, 2
Feel free to ignore the dash marks up top. And please tell me if I can be more specific as to figure this out easier and provide less headaches for those willing to help me out.
Thank you so much!

You'll want a conditional SUM, using CASE.
Not sure of syntax for your db exactly, but something like:
, SUM (CASE WHEN TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start))/3600 > 24 THEN 0
ELSE TIME_TO_SEC(TIMEDIFF(re.time_end, re.time_start))/3600
END)'Total Time in Hours'

This is not an attempt to answer your question. It's being presented as an answer rather than a comment for better formatting and readability.
You have this
AND DATE_FORMAT(re.time_start, '%Y-%m-%d') >= '2013-08-12'
AND DATE_FORMAT(re.time_start, '%Y-%m-%d') <= '2013-08-18'
in your where clause. Using functions like this make your query take longer to execute, especially on indexed fields. Something like this would run quicker.
AND re.time_start >= a date value goes here
AND re.time_start <= another date value goes here

Do you want to cut off when start/end are before/after your time range?
You can use a case to adjust it based on your timeframe, e.g. for time_start
case
when re.time_start < timestamp '2013-08-12 00:00:00'
then timestamp '2013-08-12 00:00:00'
else re.time_start
end
similar for time_end and then use those CASEs within your TIMEDIFF.
Btw, your where-condition for a given date range should be:
where time_start < timestamp '2013-08-19 00:00:00'
and time_end >= timestamp '2013-08-12 00:00:00'
This will return all idle times between 2013-08-12 and 2013-08-18

SQL Select within a range of seconds

I have the table below
2012-05-24 19:00:00.000
2012-07-27 15:51:18.750
2012-07-30 09:40:25.333
2012-07-30 14:25:27.563
2012-07-27 15:51:18.750
2012-07-30 09:40:25.333
2012-07-30 14:25:27.563
2012-05-12 09:23:16.850
2012-05-12 18:00:00.000
I am trying to do a range select, so for example
SELECT * FROM RUN WHERE RUN_DATETIME = '14:25:29.563'
This is a very simple select, but my problem is that the date I am searching code be up too 30 seconds out from what is in the table above, so I need to be able to do the same as above but with a 30 second window and I am not sure what the best way to do this is.
This select is not based on another row, just the rows RUN_DATE within the window.
I am using SQL server 2008 R2

SELECT * FROM RUN
WHERE RUN_DATETIME < DATEDADD(s, '14:25:29.563', 30) AND
RUN_DATETIME > DATEDADD(s, '14:25:29.563', -30)
More complicated looking than podiluska's answer, but this works with indexes by pre-calculating the range.

SELECT *
FROM RUN
WHERE ABS(DATEDIFF(s, RUN_DATETIME , '14:25:29.563' ))< 30

SELECT
*
FROM
RUN
WHERE
RUN_DATETIME >= DATEADD(second, -30, '14:25:29.563')
AND RUN_DATETIME < DATEADD(second, 30, '14:25:29.563')
This is longer than the ABS(DATEDIFF()) version. It is, however, much faster when applied to indexed fields.
That is because the optimiser can easily see that you want all records within one sequential block. It can search for the start, then search for the end, and return everything between.
The ABS(DATEDIFF()) variation requires every row to be checked independantly, and makes no use of indexes or range seeks. It's a full scan of the whole table.
EDIT:
Also note that I use >= and <. This is standard practice for ranges of time.
For example val >= 0 AND val < 60 and val >= 60 AND val < 120 ensures that the value at val = 60 is only counted in one range of time.

I think RUN_DateTime column contains Date and comparison we are doing is with only Time here '14:25:29.563'. I agree with podiluska's answer. Just we would need to convert '14:25:29.563' to a date by taking out Day, month, year from RUN_DateTime column. We can do it by Date_Part function.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Querying Timedelta df column - pandas

Related

Retrieve data 60 days prior to their retest date

pandas convert delta time column into one sole unit time

compare date types using between

How to get a SUM of a DATEDIFF but provide cut-off at 24 hours IF a single day is specified

SQL Select within a range of seconds

Categories

Resources