pandas to_datetime convert 6PM to 18

pandas to_datetime convert 6PM to 18 - pandas

is there a nice way to convert Series data, represented like 1PM or 11AM to 13 and 11 accordingly with to_datetime or similar (other, than re)
data:
series
1PM
11AM
2PM
6PM
6AM
desired output:
series
13
11
14
18
6
pd.to_datetime(df['series']) gives the following error:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 11:00:00

You can provide the format you want to use, with as format %I%p:
pd.to_datetime(df['series'], format='%I%p').dt.hour
The .dt.hour [pandas-doc] will thus obtain the hour for that timestamp. This gives us:
>>> df = pd.DataFrame({'series': ['1PM', '11AM', '2PM', '6PM', '6AM']})
>>> pd.to_datetime(df['series'], format='%I%p').dt.hour
0 13
1 11
2 14
3 18
4 6
Name: series, dtype: int64

Related

Pandas -- get dates closest to nth day of month

This Python code identifies the rows where the day of the month equals 5. For a month that does not have day 5, because it is a weekend or holiday, I want the mask to be True for the earlier date that is closest to day 5. I could write a loop to identify such dates, but is there an array formula to do this?
import pandas as pd
infile = "dates.csv"
df = pd.read_csv(infile)
dtimes = pd.to_datetime(df.iloc[:,0])
mask = (dtimes.dt.day == 5)

For test purpose I created the following DataFrame (with a single column):
xxx
0 2022-11-03
1 2022-11-04
2 2022-11-07
3 2022-12-02
4 2022-12-05
5 2022-12-06
6 2023-01-04
7 2023-01-05
8 2023-01-06
9 2023-02-02
10 2023-02-03
11 2023-02-06
12 2023-02-07
13 2023-04-02
14 2023-04-05
15 2023-04-06
Because I based my solution on groupby method, I created dtimes
as a Series with the index equal to values:
wrk = pd.to_datetime(df.iloc[:,0])
dtimes = pd.Series(wrk.values, index=wrk)
Then, to find the valid date within the current group of dates
(a single month), I defined the followig function:
def findDate(grp):
if grp.size == 0:
return None
dd = grp.dt.day
if dd.eq(5).any():
dd = dd[dd.eq(5)]
else:
dd = dd[dd.lt(5)]
return dd.index[-1]
To find valid dates, for "existing" months, run:
validDates = dtimes.groupby(pd.Grouper(freq='M')).apply(findDate).dropna()
The result is:
xxx
2022-11-30 2022-11-04
2022-12-31 2022-12-05
2023-01-31 2023-01-05
2023-02-28 2023-02-03
2023-04-30 2023-04-05
dtype: datetime64[ns]
And to create your mask, run:
mask = dtimes.isin(validDates).values
To see the filtered rows, run:
df[mask]
getting:
xxx
1 2022-11-04
4 2022-12-05
7 2023-01-05
10 2023-02-03
14 2023-04-05

change multiple date time formats to single format in pandas dataframe

I have a DataFrame with multiple formats as shown below
0 07-04-2021
1 06-03-1991
2 12-10-2020
3 07/04/2021
4 05/12/1996
What I want is to have one format after applying the Pandas function to the entire column so that all the dates are in the format
date/month/year
What I tried is the following
date1 = pd.to_datetime(df['Date_Reported'], errors='coerce', format='%d/%m/%Y')
But it is not working out. Can this be done? Thank you

try with dayfirst=True:
date1=pd.to_datetime(df['Date_Reported'], errors='coerce',dayfirst=True)
output of date1:
0 2021-04-07
1 1991-03-06
2 2020-10-12
3 2021-04-07
4 1996-12-05
Name: Date_Reported, dtype: datetime64[ns]
If needed:
date1=date1.dt.strftime('%d/%m/%Y')
output of date1:
0 07/04/2021
1 06/03/1991
2 12/10/2020
3 07/04/2021
4 05/12/1996
Name: Date_Reported, dtype: object

Pandas 1.0 create column of months from year and date

I have a dataframe df with values as:
df.iloc[1:4, 7:9]
Year Month
38 2020 4
65 2021 4
92 2022 4
I am trying to create a new MonthIdx column as:
df['MonthIdx'] = pd.to_timedelta(df['Year'], unit='Y') + pd.to_timedelta(df['Month'], unit='M') + pd.to_timedelta(1, unit='D')
But I get the error:
ValueError: Units 'M' and 'Y' are no longer supported, as they do not represent unambiguous timedelta values durations.
Following is the desired output:
df['MonthIdx']
MonthIdx
38 2020/04/01
65 2021/04/01
92 2022/04/01

So you can pad the month value in a series, and then reformat to get a datetime for all of the values:
month = df.Month.astype(str).str.pad(width=2, side='left', fillchar='0')
df['MonthIdx'] = pd.to_datetime(pd.Series([int('%d%s' % (x,y)) for x,y in zip(df['Year'],month)]),format='%Y%m')
This will give you:
Year Month MonthIdx
0 2020 4 2020-04-01
1 2021 4 2021-04-01
2 2022 4 2022-04-01
You can reformat the date to be a string to match exactly your format:
df['MonthIdx'] = df['MonthIdx'].apply(lambda x: x.strftime('%Y/%m/%d'))
Giving you:
Year Month MonthIdx
0 2020 4 2020/04/01
1 2021 4 2021/04/01
2 2022 4 2022/04/01

Add random datetimes to timestamps

I have a column of timestamps that span over 24 hours. I want to convert these to differentiate between days. I've done this by converting to timedelta. The result is displayed below.
The question I have is, can these be converted or re-arranged again to provide random datetimes. e.g. dd:mm:yyyy hh:mm:ss.
import pandas as pd
df = pd.DataFrame({
'Time' : ['8:00','18:00','28:00'],
})
df['Time'] = [x + ':00' for x in df['Time']]
df['Time'] = pd.to_timedelta(df['Time'])
Out:
Time
0 0 days 08:00:00
1 0 days 18:00:00
2 1 days 04:00:00
Intended Output:
Time
0 1/01/1904 08:00:00 AM
1 1/01/1904 18:00:00 PM
2 2/01/1904 04:00:00 AM
The input timestamps will never go over more than 2 days. Is there a package that can achieve this or would a dummy start and end dates.

After you convert the Time just adding the date part
df.Time+pd.to_datetime('1904-01-01')
0 1904-01-01 08:00:00
1 1904-01-01 18:00:00
2 1904-01-02 04:00:00
Name: Time, dtype: datetime64[ns]

When plotting a dataframe, how to set the x-range for a 'YYYY-MM' value

I have a pandas df with the below values. I can create a nifty chart that looks like the following:
import matplotlib.pyplot as plt
ax = pdf_month.plot(x="month", y="count", kind="bar")
plt.show()
I want to truncate the date range (to ignore 1900-01-01 and other months that not import, but everytime I try I get error messages (see below). The date range would be something like '2016-01' to '2018-04'
ax.set_xlim(pdf_month['month'][17],pdf_date['count'].values.max())
where pdf_month['month'][17] gives you a value of u'2017-01'.
pdf_month.printSchema
root
|-- month: string (nullable = true)
|-- count: long (nullable = false)
How do I set the range on the month values for a x-value that isn't really an int or a date. I still have the original, pre-grouped dates. Is there a better way to group by month that would allow you to customize the x-axis?
error messages:
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
sample output of pd_month
month count
0 1900-01 353
1 2015-09 1
2 2015-10 2
3 2015-11 2
4 2015-12 1
5 2016-01 1
6 2016-02 1
7 2016-03 3
8 2016-04 2
9 2016-05 5
10 2016-06 7
11 2016-07 13
12 2016-08 12
13 2016-09 41
14 2016-10 19
15 2016-11 17
16 2016-12 20

You can try Series date indexing, Pandas Series allow for date slicing as follows:
df.month['2016-01': '2018-04']
This works with datetime indexes.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

pandas to_datetime convert 6PM to 18 - pandas

Related

Pandas -- get dates closest to nth day of month

change multiple date time formats to single format in pandas dataframe

Pandas 1.0 create column of months from year and date

Add random datetimes to timestamps

When plotting a dataframe, how to set the x-range for a 'YYYY-MM' value

Categories

Resources