Outputting pandas timestamp to tuple with just month and day - pandas

I have a pandas dataframe with a timestamp field which I have successfully to converted to datetime format and now I want to output just the month and day as a tuple for the first date value in the data frame. It is for a test and the output must not have leading zeros. I ahve tried a number of things but I cannot find an answer without converting the timestamp to a string which does not work.
This is the format
2021-05-04 14:20:00.426577
df_cleaned['trans_timestamp']=pd.to_datetime(df_cleaned['trans_timestamp']) is as far as I have got with the code.
I have been working on this for days and cannot get output the checker will accept.

Update
If you want to extract month and day from the first record (solution proposed by #FObersteiner)
>>> df['trans_timestamp'].iloc[0].timetuple()[1:3]
(5, 4)
If you want extract all month and day from your dataframe, use:
# Setup
df = pd.DataFrame({'trans_timestamp': ['2021-05-04 14:20:00.426577']})
df['trans_timestamp'] = pd.to_datetime(df['trans_timestamp'])
# Extract tuple
df['month_day'] = df['trans_timestamp'].apply(lambda x: (x.month, x.day))
print(df)
# Output
trans_timestamp month_day
0 2021-05-04 14:20:00.426577 (5, 4)

Related

TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns]

I have a column of years from the sunspots dataset.
I want to convert column 'year' in integer e.g. 1992 to datetime format then find the time delta and eventually compute total seconds (cumulative) to represent the time index column of a time series.
I am trying to use the following code but I get the error
TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns]
sunspots_df['year'] = pd.to_timedelta(pd.to_datetime(sunspots_df['year'], format='%Y') ).dt.total_seconds()
pandas.Timedelta "[r]epresents a duration, the difference between two dates or times." So you're trying to get Python to tell you the difference between a particular datetime and...nothing. That's why it's failing.
If it's important that you store your index this way (and there may be better ways), then you need to pick a start datetime and compute the difference to get a timedelta.
For example, this code...
import pandas as pd
df = pd.DataFrame({'year': [1990,1991,1992]})
diff = (pd.to_datetime(df['year'], format='%Y') - pd.to_datetime('1990', format='%Y'))\
.dt.total_seconds()
...returns a series whose values are seconds from January 1st, 1990. You'll note that it doesn't invoke pd.to_timedelta(), because it doesn't need to: the result of the subtraction is automatically a pd.timedelta column.

How to separate the date, hour and timezone info using pandas?

I'm curious about how to use pandas to deal with this sort of info in a .csv file:
2022-08-11 11:50:01 America/Los_Angeles
My goal is to extract the date, hour and minute, and the timezone info for further analysis.
I have tried to lift out the date and time using:
df['Date'] = pd.to_datetime(df['datetime']).dt.date
but got an error because of the string at the end. Other than extracting the date and time using specific indices, is there any better and quicker way? Thank you so much.
pandas cannot handle a datetime column with different timezones. You can start by splitting the datetime and timezone in separate columns:
df[['datetime', 'timezone']] = df['datetime'].str.rsplit(' ', n=1, expand=True)
df['datetime'] = pd.to_datetime(df['datetime']) # this column now has the datetime64[ns] type
Now you are able to do the following:
df['date_only'] = df['datetime'].dt.date
If you want to express all local date/times in America/Los_Angeles time:
df['LA_datetime'] = df.apply(lambda x: x['datetime'].tz_localize(tz=x['timezone']).tz_convert('America/Los_Angeles'), axis = 1)
You can change America/Los_Angeles to the timezone of your liking.

Extract the first 10 values of a column and create a new one [duplicate]

I am looking to convert datetime to date for a pandas datetime series.
I have listed the code below:
df = pd.DataFrame()
df = pandas.io.parsers.read_csv("TestData.csv", low_memory=False)
df['PUDATE'] = pd.Series([pd.to_datetime(date) for date in df['DATE_TIME']])
df['PUDATE2'] = datetime.datetime.date(df['PUDATE']) #Does not work
Can anyone guide me in right direction?
You can access the datetime methods of a Pandas series by using the .dt methods (in a aimilar way to how you would access string methods using .str. For your case, you can extract the date of your datetime column as:
df['PUDATE'].dt.date
This is a simple way to get day of month, from a pandas
#create a dataframe with dates as a string
test_df = pd.DataFrame({'dob':['2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04']})
#convert column to type datetime
test_df['dob']= pd.to_datetime(test_df['dob'])
# Extract day, month , year using dt accessor
test_df['DayOfMonth']=test_df['dob'].dt.day
test_df['Month']=test_df['dob'].dt.month
test_df['Year']=test_df['dob'].dt.year
I think you need to specify the format for example
df['PUDATE2']=datetime.datetime.date(df['PUDATE'], format='%Y%m%d%H%M%S')
So you just need to know what format you are using

Excel binary Date field converted to numpy int64

I have an binary excel file with DATE column with value '7/31/2020'.
Upon reading the file the DATE value is getting converted to numpy.int64 with value 44043.
Can you tell me how to stop this conversion or getting the date as is given in excel.
This is my code to read the excel file
>>df = pd.read_excel('hello.xlsb', engine='pyxlsb')
>>df[DATE][0]
>>44043
Apparently the integer value is the number of days since the 0th of January 1900. But the 0th of January doesn't exist: there seems to be a fudge factor of 2 involved here.
>>> import datetime
>>> d = datetime.date(1900, 1, 1) + datetime.timedelta(days=44043 - 2)
>>> d
datetime.date(2020, 7, 31)
>>> d.isoformat()
'2020-07-31'
>>> d.strftime("%m/%d/%Y")
'07/31/2020'
See the strftime docs for other formatting options.
You could try parsing the column as a date format when reading it in:
df = pd.read_excel('hello.xlsb', engine='pyxlsb', parse_dates=[DATE])
DATE is the variable with the column name expected to be in date format.

How do I convert a non zero padded day string to a useful date in pandas

I'm trying to import a date string with non-zero padded day, zero padded month, and year without century to create a datetime e.g. (11219 to 01/12/19). However, pandas cannot distinguish between the day and the month (e.g. 11219 could be 11th February, 2019 or 1st December, 2019).
I've tried using 'dayfirst' and the '#' in the day e.g. %#d, but nothing works. Code below, any advise?
Code:
df_import['newDate'] = pd.to_datetime(df_import['Date'], format='%d/%m/%Y', dayfirst = True)
Error:
time data '11219' does not match format '%d/%m/%Y' (match)
Since only the day is not zero-padded, the dates are unambiguous. They can simply be parsed by Pandas if we add the pad:
pd.to_datetime(df_import['Date'].str.zfill(6), format='%d%m%y')
use zfill()
A custom function can also be used if you want to handle more cases.
def getDate(str):
return #logic to parse
df_import['newDate'] = df_import['Date'].apply(getDate)