Covert MM:SS column to HH:MM:SS column in Pandas? - pandas

Convert MM:SS column to HH:MM:SS column in Pandas. I tried every possible way, like changing datatype and to_datetime and to_timedelta, but I couldn't covert the series. Please help somebody. I am getting errors like:
(here chiptime is in MM:SS format, which I want to change in HH:MM:SS)
df2["ChipTime"]=pd.to_datetime(df2.ChipTime, unit="hour").dt.strftime('%H:%M:%S')
ValueError: cannot cast unit hour
df2["ChipTime"]=pd.to_timedelta(df2["ChipTime"])
ValueError: expected hh:mm:ss format
df2["ChipTime"]=df2["ChipTime"].astype(int)
ValueError: invalid literal for int() with base 10: '16:48'
I have tried more methods, above are some of them, I am beginner in Pandas, so please excuse me if I have done any blunder. Thanks

If convert values to datetimes there are added default year, month, day with parameter format in to_datetime, if neccesary is possible convert values to times by Series.dt.time
df2 = pd.DataFrame({'ChipTime':['16:48','10:48']})
df2["ChipTime1"]=pd.to_datetime(df2.ChipTime, format="%M:%S")
df2["ChipTime11"]=pd.to_datetime(df2.ChipTime, format="%M:%S").dt.time
Or for timedeltas add 00: for default hour by to_timedelta:
df2["ChipTime2"]=pd.to_timedelta('00:' + df2["ChipTime"])
print (df2)
ChipTime ChipTime1 ChipTime11 ChipTime2
0 16:48 1900-01-01 00:16:48 00:16:48 00:16:48
1 10:48 1900-01-01 00:10:48 00:10:48 00:10:48

Related

AWS Glue studio converting Pyspark string column to date returns null

I have data from an S3 bucket and want to convert the Date column from string to date. The current Date column is in the format 7/1/2022 12:0:15 AM.
Current code I am using in AWS Glue Studio to attempt the custom transformation:
MyTransform (glueContext, dfc) -> DynamicFrameCollection:
from pyspark.sql.functions import col, to_timestamp
df = dfc.select(list(dfc.keys())[0]).toDF()
df = df.withColumn('Date',to_timestamp(col("Date"), 'MM/dd/yyyy HH:MM:SS'))
df_res = DynamicFrame.fromDF(df, glueContext, "df")
return(DynamicFrameCollection({"CustomTransform0": df_res}, glueContext))
With MM/dd/yyyy HH:MM:SS date formatting, it runs but returns null for the Date column. When I try any other date format besides this, it errors out. I suspect the date formatting may be the issue, but I am not certain.
After converting string to timestamp you need to cast it to date type, like this:
df = df.withColumn(df_col, df[df_col].cast("date"))
We ended up removing the HH:MM:SS portion of the date format and this worked for our needs. I would still be interested if anyone can figure out how to get the hours, minutes, seconds, and AM/PM to work, but we can do without for now.

TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns]

I have a column of years from the sunspots dataset.
I want to convert column 'year' in integer e.g. 1992 to datetime format then find the time delta and eventually compute total seconds (cumulative) to represent the time index column of a time series.
I am trying to use the following code but I get the error
TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns]
sunspots_df['year'] = pd.to_timedelta(pd.to_datetime(sunspots_df['year'], format='%Y') ).dt.total_seconds()
pandas.Timedelta "[r]epresents a duration, the difference between two dates or times." So you're trying to get Python to tell you the difference between a particular datetime and...nothing. That's why it's failing.
If it's important that you store your index this way (and there may be better ways), then you need to pick a start datetime and compute the difference to get a timedelta.
For example, this code...
import pandas as pd
df = pd.DataFrame({'year': [1990,1991,1992]})
diff = (pd.to_datetime(df['year'], format='%Y') - pd.to_datetime('1990', format='%Y'))\
.dt.total_seconds()
...returns a series whose values are seconds from January 1st, 1990. You'll note that it doesn't invoke pd.to_timedelta(), because it doesn't need to: the result of the subtraction is automatically a pd.timedelta column.

convert pandas datetime field with NAT entries to date

I have a Pandas dataframe with a field that is datetime datatype. Most of the values in the field are valid datetime values, but some are NAT.
I need to drop the time part of the datetime values for each value in the field, keeping the field as date datatype (not str). I tried the following:
df['mydate'] = df['mydate'].dt.date
it work fine if there is no NAT values in the column. However, if there are NAT values, it throws this error
{AttributeError}Can only use .dt accessor with datetimelike values
I tried this alternative to skip over the NAT:
df['mydate'] = [d.date if not pd.isnull(d) else None for d in df['mydate']]
but this converted the values in the column to:
<built-in method date of Timestamp object at 0x000002A06F6501C8>
Please advise how ignore or skip the NAT in the field when converting. I'v had no luck googling for an answer, and I am trying to avoid using iterrows() looping on the entire dataframe.
First convert values to datetimes and then working nice dt.date function:
df = pd.DataFrame({'mydate':['2015-04-04','2018-09-10', np.nan]})
df['new'] = pd.to_datetime(df['mydate'], errors='coerce').dt.date
print (df)
mydate new
0 2015-04-04 2015-04-04
1 2018-09-10 2018-09-10
2 NaN NaT

fixing dbtimestamp conversion mistake

I am using ssis to convert a dd-mm-yyyy varchar input into a dbtimestamp field, using the data conversion transformation. My mistake was that the conversion produces a yyyy-mm-dd where mm is the dd from the input and the dd is the mm. So if i have an input 04-01-2019 00:00:000 it produces 2019-04-01 00:00:000.
my solution is to use substrings to transform the input into standard iso format YYYY-MM_dd first and then convert to datetime data type. My problem is how am i going to correct the existing records (move dd to mm and mm to dd)? Probably substrings again?
Assuming your table only has incorrectly implicitly cast values (like '04-01-2019' (dd-MM-yyyy) to 20190401) and none which haven't been, you could use CONVERT and some style codes:
SELECT D,
CONVERT(date,CONVERT(varchar(10),V.D,101),103)
FROM (VALUES(CONVERT(date,'20190401'))) V(D);
As an UPDATE statement, that would be:
UPDATE YourTable
SET YourDateColumn = CONVERT(date,CONVERT(varchar(10),YourDateColumn ,101),103);
This converts the date back into a varchar with the format MM/dd/yyyy, and converts it back to a date but treats the value as dd/MM/yyyy (thus switching the day and month).

Pandas Time Series Conversion and Formatting

How do I convert a string in this format to a Pandas timestamp?
00:55:02:285
hours:minutes:seconds:milliseconds
I have a dataframe already with several columns in this format.
Pandas don't seem to recognize this format as a timestamp when I use any of the conversion functions, e.g.. to_datetime()
Many Thanks.
I think you need parameter format in to_datetime:
df = pd.DataFrame({'times':['00:55:02:285','00:55:02:285']})
print (df)
times
0 00:55:02:285
1 00:55:02:285
print (pd.to_datetime(df.times, format='%H:%M:%S:%f'))
0 1900-01-01 00:55:02.285
1 1900-01-01 00:55:02.285
Name: times, dtype: datetime64[ns]