pandas to_datetime convert 6PM to 18 - pandas

is there a nice way to convert Series data, represented like 1PM or 11AM to 13 and 11 accordingly with to_datetime or similar (other, than re)
data:
series
1PM
11AM
2PM
6PM
6AM
desired output:
series
13
11
14
18
6
pd.to_datetime(df['series']) gives the following error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 11:00:00

You can provide the format you want to use, with as format %I%p:
pd.to_datetime(df['series'], format='%I%p').dt.hour
The .dt.hour [pandas-doc] will thus obtain the hour for that timestamp. This gives us:
>>> df = pd.DataFrame({'series': ['1PM', '11AM', '2PM', '6PM', '6AM']})
>>> pd.to_datetime(df['series'], format='%I%p').dt.hour
0 13
1 11
2 14
3 18
4 6
Name: series, dtype: int64

Related

Pandas -- get dates closest to nth day of month

This Python code identifies the rows where the day of the month equals 5. For a month that does not have day 5, because it is a weekend or holiday, I want the mask to be True for the earlier date that is closest to day 5. I could write a loop to identify such dates, but is there an array formula to do this?
import pandas as pd
infile = "dates.csv"
df = pd.read_csv(infile)
dtimes = pd.to_datetime(df.iloc[:,0])
mask = (dtimes.dt.day == 5)
For test purpose I created the following DataFrame (with a single column):
xxx
0 2022-11-03
1 2022-11-04
2 2022-11-07
3 2022-12-02
4 2022-12-05
5 2022-12-06
6 2023-01-04
7 2023-01-05
8 2023-01-06
9 2023-02-02
10 2023-02-03
11 2023-02-06
12 2023-02-07
13 2023-04-02
14 2023-04-05
15 2023-04-06
Because I based my solution on groupby method, I created dtimes
as a Series with the index equal to values:
wrk = pd.to_datetime(df.iloc[:,0])
dtimes = pd.Series(wrk.values, index=wrk)
Then, to find the valid date within the current group of dates
(a single month), I defined the followig function:
def findDate(grp):
if grp.size == 0:
return None
dd = grp.dt.day
if dd.eq(5).any():
dd = dd[dd.eq(5)]
else:
dd = dd[dd.lt(5)]
return dd.index[-1]
To find valid dates, for "existing" months, run:
validDates = dtimes.groupby(pd.Grouper(freq='M')).apply(findDate).dropna()
The result is:
xxx
2022-11-30 2022-11-04
2022-12-31 2022-12-05
2023-01-31 2023-01-05
2023-02-28 2023-02-03
2023-04-30 2023-04-05
dtype: datetime64[ns]
And to create your mask, run:
mask = dtimes.isin(validDates).values
To see the filtered rows, run:
df[mask]
getting:
xxx
1 2022-11-04
4 2022-12-05
7 2023-01-05
10 2023-02-03
14 2023-04-05

change multiple date time formats to single format in pandas dataframe

I have a DataFrame with multiple formats as shown below
0 07-04-2021
1 06-03-1991
2 12-10-2020
3 07/04/2021
4 05/12/1996
What I want is to have one format after applying the Pandas function to the entire column so that all the dates are in the format
date/month/year
What I tried is the following
date1 = pd.to_datetime(df['Date_Reported'], errors='coerce', format='%d/%m/%Y')
But it is not working out. Can this be done? Thank you
try with dayfirst=True:
date1=pd.to_datetime(df['Date_Reported'], errors='coerce',dayfirst=True)
output of date1:
0 2021-04-07
1 1991-03-06
2 2020-10-12
3 2021-04-07
4 1996-12-05
Name: Date_Reported, dtype: datetime64[ns]
If needed:
date1=date1.dt.strftime('%d/%m/%Y')
output of date1:
0 07/04/2021
1 06/03/1991
2 12/10/2020
3 07/04/2021
4 05/12/1996
Name: Date_Reported, dtype: object

Pandas 1.0 create column of months from year and date

I have a dataframe df with values as:
df.iloc[1:4, 7:9]
Year Month
38 2020 4
65 2021 4
92 2022 4
I am trying to create a new MonthIdx column as:
df['MonthIdx'] = pd.to_timedelta(df['Year'], unit='Y') + pd.to_timedelta(df['Month'], unit='M') + pd.to_timedelta(1, unit='D')
But I get the error:
ValueError: Units 'M' and 'Y' are no longer supported, as they do not represent unambiguous timedelta values durations.
Following is the desired output:
df['MonthIdx']
MonthIdx
38 2020/04/01
65 2021/04/01
92 2022/04/01
So you can pad the month value in a series, and then reformat to get a datetime for all of the values:
month = df.Month.astype(str).str.pad(width=2, side='left', fillchar='0')
df['MonthIdx'] = pd.to_datetime(pd.Series([int('%d%s' % (x,y)) for x,y in zip(df['Year'],month)]),format='%Y%m')
This will give you:
Year Month MonthIdx
0 2020 4 2020-04-01
1 2021 4 2021-04-01
2 2022 4 2022-04-01
You can reformat the date to be a string to match exactly your format:
df['MonthIdx'] = df['MonthIdx'].apply(lambda x: x.strftime('%Y/%m/%d'))
Giving you:
Year Month MonthIdx
0 2020 4 2020/04/01
1 2021 4 2021/04/01
2 2022 4 2022/04/01

Add random datetimes to timestamps

I have a column of timestamps that span over 24 hours. I want to convert these to differentiate between days. I've done this by converting to timedelta. The result is displayed below.
The question I have is, can these be converted or re-arranged again to provide random datetimes. e.g. dd:mm:yyyy hh:mm:ss.
import pandas as pd
df = pd.DataFrame({
'Time' : ['8:00','18:00','28:00'],
})
df['Time'] = [x + ':00' for x in df['Time']]
df['Time'] = pd.to_timedelta(df['Time'])
Out:
Time
0 0 days 08:00:00
1 0 days 18:00:00
2 1 days 04:00:00
Intended Output:
Time
0 1/01/1904 08:00:00 AM
1 1/01/1904 18:00:00 PM
2 2/01/1904 04:00:00 AM
The input timestamps will never go over more than 2 days. Is there a package that can achieve this or would a dummy start and end dates.
After you convert the Time just adding the date part
df.Time+pd.to_datetime('1904-01-01')
0 1904-01-01 08:00:00
1 1904-01-01 18:00:00
2 1904-01-02 04:00:00
Name: Time, dtype: datetime64[ns]

When plotting a dataframe, how to set the x-range for a 'YYYY-MM' value

I have a pandas df with the below values. I can create a nifty chart that looks like the following:
import matplotlib.pyplot as plt
ax = pdf_month.plot(x="month", y="count", kind="bar")
plt.show()
I want to truncate the date range (to ignore 1900-01-01 and other months that not import, but everytime I try I get error messages (see below). The date range would be something like '2016-01' to '2018-04'
ax.set_xlim(pdf_month['month'][17],pdf_date['count'].values.max())
where pdf_month['month'][17] gives you a value of u'2017-01'.
pdf_month.printSchema
root
|-- month: string (nullable = true)
|-- count: long (nullable = false)
How do I set the range on the month values for a x-value that isn't really an int or a date. I still have the original, pre-grouped dates. Is there a better way to group by month that would allow you to customize the x-axis?
error messages:
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
sample output of pd_month
month count
0 1900-01 353
1 2015-09 1
2 2015-10 2
3 2015-11 2
4 2015-12 1
5 2016-01 1
6 2016-02 1
7 2016-03 3
8 2016-04 2
9 2016-05 5
10 2016-06 7
11 2016-07 13
12 2016-08 12
13 2016-09 41
14 2016-10 19
15 2016-11 17
16 2016-12 20
You can try Series date indexing, Pandas Series allow for date slicing as follows:
df.month['2016-01': '2018-04']
This works with datetime indexes.