How can I convert in pandas a date format that looks something like this:
2018-08-27 00:00:00.000
2018-08-26 00:00:00.000
2018-08-24 00:00:00.000
2018-08-24 00:00:00.000
2018-08-24 00:00:00.000
2018-08-24 00:00:00.000
2018-08-23 00:00:00.000
2018-08-23 00:00:00.000
2018-08-20 00:00:00.000
2018-08-20 00:00:00.000
to an integer format counting the days since first of January 2010?
Subtract date from column by Series.sub and convert timedeltas to days by Series.dt.days:
df['days'] = pd.to_datetime(df['date']).sub(pd.Timestamp('2010-01-01')).dt.days
print (df)
date days
0 2018-08-27 00:00:00.000 3160
1 2018-08-26 00:00:00.000 3159
2 2018-08-24 00:00:00.000 3157
3 2018-08-24 00:00:00.000 3157
4 2018-08-24 00:00:00.000 3157
5 2018-08-24 00:00:00.000 3157
6 2018-08-23 00:00:00.000 3156
7 2018-08-23 00:00:00.000 3156
8 2018-08-20 00:00:00.000 3153
9 2018-08-20 00:00:00.000 3153
You can simply apply sub on the pandas Timestamp column like this as mentioned by jezrael in his answer which is very direct.
If you want to do the same serially one by one you can do it like this with the help of map
base_date = pd.Timestamp('2010-01-01 00:00:00')
df['days'] = df['date'].map(lambda date : (pd.Timestamp(date) - base_date).days )
Related
I'm looking to calculate the number of days that overlap between to (DateTime) spans of time.
Logic behind the question is: A prisoner is serving a sentence from Orig bed start (beginning of his sentence) to Bed End Date (end of his sentence). During his sentence he took a leave of absence for whatever reason... idea is to calculate the numbers of days that specific prisoner took days off from his sentence as an example.
Making sure the leave start and end dates fall between the bed start and end days and then calculating the date difference and ignoring the rest.
Given this existing data:
ORIG_BED_START
ORIG_BED_END
LEAVE_START_DAT
LEAVE_END_DATE
LEAVE_DAYS
2022-10-19 09:21:00.000
2022-11-02 14:49:00.000
2022-10-28 00:00:00.000
2022-11-02 00:00:00.000
??
2022-11-02 14:50:00.000
2022-11-16 13:19:00.000
2022-10-28 00:00:00.000
2022-11-02 00:00:00.000
??
2022-12-19 10:17:00.000
2022-12-27 10:59:00.000
2022-12-19 00:00:00.000
2022-12-30 00:00:00.000
??
2022-12-27 11:00:00.000
NULL
2022-12-19 00:00:00.000
2022-12-30 00:00:00.000
??
2022-12-22 20:29:00.000
2022-12-29 17:48:00.000
2022-12-26 00:00:00.000
2022-12-30 00:00:00.000
??
2022-12-29 17:49:00.000
2022-12-30 14:59:00.000
2022-12-26 00:00:00.000
2022-12-30 00:00:00.000
??
I am expecting the result set to be:
ORIG_BED_START
ORIG_BED_END
LEAVE_START_DAT
LEAVE_END_DATE
LEAVE_DAYS
2022-10-19 09:21:00.000
2022-11-02 14:49:00.000
2022-10-28 00:00:00.000
2022-11-02 00:00:00.000
5
2022-11-02 14:50:00.000
2022-11-16 13:19:00.000
2022-10-28 00:00:00.000
2022-11-02 00:00:00.000
0
2022-12-19 10:17:00.000
2022-12-27 10:59:00.000
2022-12-19 00:00:00.000
2022-12-30 00:00:00.000
8
2022-12-27 11:00:00.000
NULL
2022-12-19 00:00:00.000
2022-12-30 00:00:00.000
3
2022-12-22 20:29:00.000
2022-12-29 17:48:00.000
2022-12-26 00:00:00.000
2022-12-30 00:00:00.000
4
2022-12-29 17:49:00.000
2022-12-30 14:59:00.000
2022-12-26 00:00:00.000
2022-12-30 00:00:00.000
0
This is the closest I have come
CASE
WHEN (CONVERT(DATE, LEAVE_START_DATE) >= CONVERT(DATE, ORIG_BED_START) AND
CONVERT(DATE, LEAVE_START_DATE) <= CONVERT(DATE, ORIG_BED_END))
OR (CONVERT(DATE, LEAVE_END_DATE) >= CONVERT(DATE, ORIG_BED_END) AND CONVERT(DATE, LEAVE_END_DATE) <= CONVERT(DATE, ORIG_BED_END))
THEN DATEDIFF(DAY, LEAVE_START_DATE, LEAVE_END_DATE)
ELSE ''
END AS LEAVE_DAYS
The math to evaluate is MAX(0, MIN(orig_bed_end, leave_end_date) - MAX(orig_bed_start, leave_start_dat) ), in SQL that should give:
greatest(0, trunc( convert(date,least(coalesce(orig_bed_end,leave_end_date),leave_end_date)) - convert(date,greatest(orig_bed_start,leave_start_dat))))
Depending on where you will put the trunc - after or before calculating the difference - you may have slightly different results (+-1).
(quickly converted from ORACLE syntax, so may still need fixes to work in SQLServer)
I have this dataframe(below is a sample) with three columns: clientnummer, begindatum(startdate) and einddatum(enddate)
clientnummer begindatum einddatum
3 2013-09-17 00:00:00.000 2014-03-16 00:00:00.000
3 2012-11-28 00:00:00.000 2015-04-04 00:00:00.000
4 2016-02-12 00:00:00.000 none
4 2015-09-10 00:00:00.000 2016-03-09 00:00:00.000
4 2015-12-01 00:00:00.000 2016-04-18 00:00:00.000
5 2016-09-01 00:00:00.000 2016-12-11 00:00:00.000
5 2018-02-20 00:00:00.000 2018-08-31 00:00:00.000
5 2017-02-20 00:00:00.000 2018-02-19 00:00:00.000
5 2021-01-01 00:00:00.000 2021-07-25 00:00:00.000
5 2022-01-01 00:00:00.000 2022-06-30 00:00:00.000
5 2018-09-01 00:00:00.000 2020-08-31 00:00:00.000
5 2015-11-17 00:00:00.000 2017-05-31 00:00:00.000
i Want to combine all rows of the same client if it is in the same range of the rows with the same clientnumber above. Where the beginningdate is the lowest and enddate the highest. This also has to account for None values and clients with only one row.
so outcome would be something like:
clientnummer begindatum einddatum
3 2013-09-17 00:00:00.000 2015-04-04 00:00:00.000
4 2015-09-10 00:00:00.000 2016-04-18 00:00:00.000
4 2016-02-12 00:00:00.000 none
5 2015-11-17 00:00:00.000 2018-02-19 00:00:00.000
5 2018-02-20 00:00:00.000 2018-08-31 00:00:00.000
5 2018-09-01 00:00:00.000 None
5 2021-01-01 00:00:00.000 2021-07-25 00:00:00.000
5 2022-01-01 00:00:00.000 2022-06-30 00:00:00.000
But im keet getting the error: ERROR:root:Error on transforming data: single positional indexer is out-of-bounds
This is my relevant code:
df_grouped = df_indicaties.groupby('clientnummer')
#maak een result tabel
result = pd.DataFrame(columns=['clientnummer', 'begindatum', 'einddatum'])
for client, groep in df_grouped:
# Check if the group is empty or has only one row
if groep.empty or groep.shape[0] == 1:
result = pd.concat([result, pd.DataFrame({'clientnummer': client, 'begindatum': groep['begindatum'].iloc[0], 'einddatum': groep['einddatum'].iloc[0]}, index=[0])])
else:
begindatum_start = groep.iloc[0]['begindatum']
einddatum_start = groep.iloc[0]['einddatum']
for i in groep.index:
n_current = groep.iloc[i]['n_indicatie']
current_begin = groep.iloc[i]['begindatum']
current_eind = groep.iloc[i]['einddatum']
if n_current > 1 and begindatum_start is not None and einddatum_start is not None and current_begin is not None and current_eind is not None:
if current_begin >= begindatum_start and current_begin <= einddatum_start:
if einddatum_start <= current_eind:
einddatum_start = current_eind
else:
result = result.append({'clientnummer': client, 'begindatum': current_begin, 'einddatum': current_eind}, ignore_index=True)
begindatum_start = groep.iloc[i]['begindatum']
einddatum_start = groep.iloc[i]['einddatum']
result = result.append({'clientnummer': client, 'begindatum': begindatum_start, 'einddatum': einddatum_start}, ignore_index=True)
return result
How can I apply weights from a one table to another [Port] where the weight table has sparse dates?
[Port] table
utcDT UsdPnl
-----------------------------------------------
2012-03-09 00:00:00.000 -0.00581815226439161
2012-03-11 00:00:00.000 -0.000535272460588547
2012-03-12 00:00:00.000 -0.00353079778650661
2012-03-13 00:00:00.000 0.00232882689252497
2012-03-14 00:00:00.000 -0.0102592811199384
2012-03-15 00:00:00.000 0.00254451559598693
2012-03-16 00:00:00.000 0.0146718613139845
2012-03-18 00:00:00.000 0.000425144543842752
2012-03-19 00:00:00.000 -0.00388548271428044
2012-03-20 00:00:00.000 -0.00662423680184768
2012-03-21 00:00:00.000 0.00405506208635343
2012-03-22 00:00:00.000 -0.000814822806982203
2012-03-23 00:00:00.000 -0.00289523953346103
2012-03-25 00:00:00.000 0.00204150859774465
2012-03-26 00:00:00.000 -0.00641635182718787
2012-03-27 00:00:00.000 -0.00107168420738448
2012-03-28 00:00:00.000 0.00131000520696153
2012-03-29 00:00:00.000 0.0008223678402638
2012-03-30 00:00:00.000 -0.00255345945390133
2012-04-01 00:00:00.000 -0.00337792814650089
[Weights] table
utcDT Weight
--------------------------------
2012-03-09 00:00:00.000 1
2012-03-20 00:00:00.000 3
2012-03-29 00:00:00.000 7
So, I want to use the weights as if I had a full table like this below. i.e. change to new weight on first day it appears in [Weights] table:
utcDT UsedWeight
----------------------------------
2012-03-09 00:00:00.000 1
2012-03-11 00:00:00.000 1
2012-03-12 00:00:00.000 1
2012-03-13 00:00:00.000 1
2012-03-14 00:00:00.000 1
2012-03-15 00:00:00.000 1
2012-03-16 00:00:00.000 1
2012-03-18 00:00:00.000 1
2012-03-19 00:00:00.000 1
2012-03-20 00:00:00.000 3
2012-03-21 00:00:00.000 3
2012-03-22 00:00:00.000 3
2012-03-23 00:00:00.000 3
2012-03-25 00:00:00.000 3
2012-03-26 00:00:00.000 3
2012-03-27 00:00:00.000 3
2012-03-28 00:00:00.000 3
2012-03-29 00:00:00.000 7
2012-03-30 00:00:00.000 7
2012-04-01 00:00:00.000 7
You can use apply:
select p.*, w.*
from port p outer apply
(select top (1) w.*
from weights w
where w.utcDT <= p.utcDT
order by w.utcDT desc
) w;
outer apply is usually pretty efficient, if you have the right indexes. In this case, the right inex is on weights(utcDT desc).
You can use lead() in a subquery to associate the next date a weight changes to each weights record, and then join with port using an inequality condition on the dates:
select p.utcDt, w.weight
from port p
inner join (
select utcDt, weight, lead(utcDt) over(order by utcDt) lead_utcDt from weights
) w
on p.utcDt >= w.utcDt
and (w.lead_utcDt is null or p.utcDt < w.lead_utcDt)
I want to get maximum time of a given date from a query that returns From and to Date:
A function named [fn_COM_PeriodSplit] returns output like
Output:
PrSeq FromDate ToDate
1 2015-05-01 00:00:00.000 2015-05-25 23:29:29.000
PrSeq FromDate ToDate
1 2015-05-01 00:00:00.000 2015-05-07 00:00:00.000
2 2015-05-08 00:00:00.000 2015-05-14 00:00:00.000
3 2015-05-15 00:00:00.000 2015-05-21 00:00:00.000
4 2015-05-22 00:00:00.000 2015-05-25 23:59:59.000
But I want the to date like
PrSeq FromDate ToDate
1 2015-05-01 00:00:00.000 2015-05-25 23:59:59.000
How to get to date as maximum of end time?
I have daily values in one table and monthly values in another table. I need to use the values of the monthly table and calculate them on a daily basis.
basically, monthly factor * daily factor -- for each day
thanks!
I have a table like this:
2010-12-31 00:00:00.000 28.3
2010-09-30 00:00:00.000 64.1
2010-06-30 00:00:00.000 66.15
2010-03-31 00:00:00.000 12.54
and a table like this :
2010-12-31 00:00:00.000 98.1
2010-12-30 00:00:00.000 97.61
2010-12-29 00:00:00.000 99.03
2010-12-28 00:00:00.000 97.7
2010-12-27 00:00:00.000 96.87
2010-12-23 00:00:00.000 97.44
2010-12-22 00:00:00.000 97.76
2010-12-21 00:00:00.000 96.63
2010-12-20 00:00:00.000 95.47
2010-12-17 00:00:00.000 95.2
2010-12-16 00:00:00.000 94.84
2010-12-15 00:00:00.000 94.8
2010-12-14 00:00:00.000 94.1
2010-12-13 00:00:00.000 93.88
2010-12-10 00:00:00.000 93.04
2010-12-09 00:00:00.000 91.07
2010-12-08 00:00:00.000 90.89
2010-12-07 00:00:00.000 92.72
2010-12-06 00:00:00.000 93.05
2010-12-03 00:00:00.000 91.74
2010-12-02 00:00:00.000 90.74
2010-12-01 00:00:00.000 90.25
I need to take the value for the quarter and multiply it buy all the days in the quarter by the daily value
You could try:
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND FLOOR((MONTH(dt.day)-1)/3) = FLOOR((MONTH(mt.day)-1)/3)
ORDER BY dt.day
or (as suggested by #Andriy)
SELECT dt.day, dt.factor*mt.factor AS daily_factor
FROM daily_table dt INNER JOIN month_table mt
ON YEAR(dt.day) = YEAR(mt.day)
AND DATEPART(QUARTER, dt.day) = DATEPART(QUARTER, mt.day)
ORDER BY dt.day