list comprehension vs lambda function in pandas dataframe - pandas

I'm trying to convert decimal years to datetime format in Python. I've managed to make the conversion using a list comprehension, but I cannot get a lambda function working to do the same thing. What am I doing wrong? How can I use a lambda function to make this conversion?
from datetime import datetime, timedelta
import calendar
df = pd.DataFrame(data = [2021.3, 2021.6], columns = ['dec_date'])
# define a function to convert decimal dates to datetime
def convert_partial_year(number):
# round down to get the year
year = int(number)
# get the fractional year
year_fraction = number - year
# get the number of days in the given year
days_in_year = (365 + calendar.isleap(year))
# convert the fractional year into days
d = timedelta(days=year_fraction * days_in_year)
# convert the year into a date format
day_one = datetime(year, 1, 1)
# add the days into the year onto the date format
date = d + day_one
# return the result
return date
# my lambda function does not work
df.assign(
date = lambda x: convert_partial_year(x.dec_date)
)
# my list comprehension does work
df.assign(
date = [convert_partial_year(x) for x in df.dec_date]
)

Related

Convert HH:MM:SS and MM:SS to seconds in same column in Pandas

I am trying to convert a timestamp string into an integer and having trouble.
The column in my dataframe looks like this:
time
30:03
1:15:02
I have tried df['time'].str.split(':').apply(lambda x: int(x[0]) * 60 + int(x[1])) but that only works if every value in the column is HH:MM:SS but my column is mixed. Some people took 30 minutes to finish the task and some took an hour, etc.
You can make a custom convert function where you check for time format:
def convert(x):
x = x.split(":")
if len(x) == 2:
return int(x[0]) * 60 + int(x[1])
return int(x[0]) * 3600 + int(x[1]) * 60 + int(x[2])
df["time"] = df["time"].apply(convert)
print(df)
Prints:
time
0 1803
1 4502

Pandas: Read dates in column which has different formats

I would like to read a csv, with dates in a column, but the dates are in different formats within the column.
Specifically, some dates are in "dd/mm/yyyy" format, and some are in "4####" format (excel 1900 date system, serial number represents days elapsed since 1900/1/1).
Is there any way to use read_csv or pandas.to_datetime to convert the column to datetime?
Have tried using pandas.to_datetime with no parameters to no avail.
df["Date"] = pd.to_datetime(df["Date"])
Returns
ValueError: year 42613 is out of range
Presumably it can read the "dd/mm/yyyy" format fine but produces an error for the "4####" format.
Note: the column is mixed type as well
Appreciate any help
Example
dates = ['25/07/2016', '42315']
df = DataFrame (dates, columns=['Date'])
#desired output ['25/07/2016', '07/11/2015']
Let's try:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
m = dates.isna()
dates.loc[m] = (
pd.TimedeltaIndex(df.loc[m, 'Date'].astype(int), unit='d')
+ pd.Timestamp(year=1899, month=12, day=30)
)
df['Date'] = dates
Or alternatively with seconds conversion:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
m = dates.isna()
dates.loc[m] = pd.to_datetime(
(df.loc[m, 'Date'].astype(int) - 25569) * 86400.0,
unit='s'
)
df['Date'] = dates
df:
Date
0 2016-07-25
1 2015-11-07
Explanation:
First convert to datetime all that can be done with pd.to_datetime:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
Check which values couldn't be converted:
m = dates.isna()
Convert NaTs
a. Offset as days since 1899-12-30 using TimedeltaIndex + pd.Timestamp:
dates.loc[m] = (
pd.TimedeltaIndex(df.loc[m, 'Date'].astype(int), unit='d')
+ pd.Timestamp(year=1899, month=12, day=30)
)
b. Or convert serial days to seconds mathematically:
dates.loc[m] = pd.to_datetime(
(df.loc[m, 'Date'].astype(int) - 25569) * 86400.0,
unit='s'
)
Update the Date column:
df['Date'] = dates

How can I subtract datetime in Python?

I'm trying to make an age calculator for python and I have a problem with subtracting the user's input date of birth and today's date. I have tried float but it doesn't work. I tried subtracting the variables itself but that doesn't work, either.
age_str = input ("Enter your birthday on dd-mm-yy Format:")```
age = datetime.datetime.strptime(age_str, '%d-%m-%Y')```
today_str = datetime.date.today()```
today = datetime.datetime.strptime(today_str, '%d-%m-%Y')```
total = age - today```
from datetime import date
def calculate_age(born):
today = date.today()
return today.year - born.year - ((today.month, today.day) < (born.month, born.day))

remove few dates from date range

from datetime import timedelta, date
def daterange(date1, date2):
for n in range(int ((date2 - date1).days)+1):
yield date1 + timedelta(n)
start_dt = date(2015, 12, 20)
end_dt = date(2016, 1, 11)
for dt in daterange(start_dt, end_dt):
print(dt.strftime("%Y-%m-%d"))
I have date range as above stated, but I have few dates from this date range to ignore. These dates are in dataframe.
How can I take these dates out from this date range? Anyone please suggest. Dataframe with distinct dates are below.
Pardata = spark.read.parquet("/mnt/Test/data.parquet")
Pardata.createOrReplaceTempView("parfile")
ParRes = spark.sql("SELECT distinct date FROM parfile ")
Use left_anti join:
dates = [[dt.strftime("%Y-%m-%d")] for dt in daterange(start_dt, end_dt)]
dates_df = spark.createDataFrame(dates, ["date"])
dates_df.join(ParRes, dates_df("date") === ParRes("date"), "left_anti").show()
First, create a DataFrame dates_df from that range of dates. Then use left_anti join, which filters out dates from ParRes Dataframe in the dates_df Dataframe according to the key date.

Get weekday id in odoo

I want get day id from my datetime field.
print(datetime.today().weekday()) --> return 4
my_datetime = self.start
print(my_datetime) return 2017-07-14 09:47:14
How replace datetime.today with my my_datetime?
Try this example, return day name:
my_datetime = self.start
day_name = datetime.strptime(my_datetime, '%Y-%m-%d %H:%M:%S')
print(day_name.strftime("%A"))
Use Odoo's built-in convert methods to do that:
from odoo import fields # usually already done for odoo models
if self.start:
day = fields.Datetime.from_string(self.start).weekday()