from datetime import timedelta, date
def daterange(date1, date2):
for n in range(int ((date2 - date1).days)+1):
yield date1 + timedelta(n)
start_dt = date(2015, 12, 20)
end_dt = date(2016, 1, 11)
for dt in daterange(start_dt, end_dt):
print(dt.strftime("%Y-%m-%d"))
I have date range as above stated, but I have few dates from this date range to ignore. These dates are in dataframe.
How can I take these dates out from this date range? Anyone please suggest. Dataframe with distinct dates are below.
Pardata = spark.read.parquet("/mnt/Test/data.parquet")
Pardata.createOrReplaceTempView("parfile")
ParRes = spark.sql("SELECT distinct date FROM parfile ")
Use left_anti join:
dates = [[dt.strftime("%Y-%m-%d")] for dt in daterange(start_dt, end_dt)]
dates_df = spark.createDataFrame(dates, ["date"])
dates_df.join(ParRes, dates_df("date") === ParRes("date"), "left_anti").show()
First, create a DataFrame dates_df from that range of dates. Then use left_anti join, which filters out dates from ParRes Dataframe in the dates_df Dataframe according to the key date.
Related
I'm trying to convert decimal years to datetime format in Python. I've managed to make the conversion using a list comprehension, but I cannot get a lambda function working to do the same thing. What am I doing wrong? How can I use a lambda function to make this conversion?
from datetime import datetime, timedelta
import calendar
df = pd.DataFrame(data = [2021.3, 2021.6], columns = ['dec_date'])
# define a function to convert decimal dates to datetime
def convert_partial_year(number):
# round down to get the year
year = int(number)
# get the fractional year
year_fraction = number - year
# get the number of days in the given year
days_in_year = (365 + calendar.isleap(year))
# convert the fractional year into days
d = timedelta(days=year_fraction * days_in_year)
# convert the year into a date format
day_one = datetime(year, 1, 1)
# add the days into the year onto the date format
date = d + day_one
# return the result
return date
# my lambda function does not work
df.assign(
date = lambda x: convert_partial_year(x.dec_date)
)
# my list comprehension does work
df.assign(
date = [convert_partial_year(x) for x in df.dec_date]
)
I would like to read a csv, with dates in a column, but the dates are in different formats within the column.
Specifically, some dates are in "dd/mm/yyyy" format, and some are in "4####" format (excel 1900 date system, serial number represents days elapsed since 1900/1/1).
Is there any way to use read_csv or pandas.to_datetime to convert the column to datetime?
Have tried using pandas.to_datetime with no parameters to no avail.
df["Date"] = pd.to_datetime(df["Date"])
Returns
ValueError: year 42613 is out of range
Presumably it can read the "dd/mm/yyyy" format fine but produces an error for the "4####" format.
Note: the column is mixed type as well
Appreciate any help
Example
dates = ['25/07/2016', '42315']
df = DataFrame (dates, columns=['Date'])
#desired output ['25/07/2016', '07/11/2015']
Let's try:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
m = dates.isna()
dates.loc[m] = (
pd.TimedeltaIndex(df.loc[m, 'Date'].astype(int), unit='d')
+ pd.Timestamp(year=1899, month=12, day=30)
)
df['Date'] = dates
Or alternatively with seconds conversion:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
m = dates.isna()
dates.loc[m] = pd.to_datetime(
(df.loc[m, 'Date'].astype(int) - 25569) * 86400.0,
unit='s'
)
df['Date'] = dates
df:
Date
0 2016-07-25
1 2015-11-07
Explanation:
First convert to datetime all that can be done with pd.to_datetime:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
Check which values couldn't be converted:
m = dates.isna()
Convert NaTs
a. Offset as days since 1899-12-30 using TimedeltaIndex + pd.Timestamp:
dates.loc[m] = (
pd.TimedeltaIndex(df.loc[m, 'Date'].astype(int), unit='d')
+ pd.Timestamp(year=1899, month=12, day=30)
)
b. Or convert serial days to seconds mathematically:
dates.loc[m] = pd.to_datetime(
(df.loc[m, 'Date'].astype(int) - 25569) * 86400.0,
unit='s'
)
Update the Date column:
df['Date'] = dates
Dataframe has 2 dates which are of "object" datatype. StartDate and EndDate are in the mm/dd/yyyy format.
Name StartDate EndDate
bou1 1/9/2017 1/10/2017
bou2 12/31/2016 1/10/2017
Output:
Name StartDate EndDate Diff
bou1 1/9/2017 1/10/2017 1
bou2 12/31/2016 1/10/2017 10
Any suggestions would be appreciated !!
you first need to convert to datetime for those columns and then subtract.
try
df['startDate'] = pd.to_datetime(df['startDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
df['difInDate'] = (abs(df['startDate'].sub(df['EndDate'], axis = 0))) / np.timedelta64(1, 'D')
print(df['difInDate'])
abs is just to make it days positive because, you are subtracting from small date to big date
alternatively you can use (df['EndDate'].sub(df['StartDate'] too
# Recreating your dataframe with dates stored as strings
df = pd.DataFrame({'Name' : ['bou1', 'bou2'],
'StartDate': ['01/09/2017','12/31/2016'],
'EndDate' : ['01/10/2017', '01/10/2017']})
# Date strings converted with pd.Datetime
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# .dt handles your calculation and .days outputs in days
df['Diff'] = (df['EndDate'] - df['StartDate']).dt.days
# Just prints the columns in your order
df[['Name', 'StartDate', 'EndDate', 'Diff']]
I want to retrieve those records which have Data <= paramdate and also those records which have Date = '1753/01/01'
Thanks
you could use DATEDIFF
DATEDIFF will return different time, day, month, pr year, based what you want to compare.
Example :
`DATEDIFF (DAY, '01/01/2017', '02/01/2017')`
This will return 1, as the comparation is the day.
Note : datediff also could return negative value, as the position which is higher. Example :
'DATEDIFF (DAY, '02/01/2017', '01/01/2017')'
will return -1
When you want to get the data that the date and the format are specified , you could use CONVERT to convert it to your formatted date and compare it like usual
Example :
WHERE CONVERT(VARCHAR, '2017/01/01', 103) = '01/01/2017'
I've got a query which basically get information concerning the last 2 years from the current Date.
DateTime jodaCurrentDate = new DateTime();
DateTimeFormatter formatter = DateTimeFormat.forPattern("YYYY-MM-dd HH:mm");
String startingDate = formatter.print(jodaCurrentDate.minusYears(2));
String endingDate = formatter.print(jodaCurrentDate);
"DATE BETWEEN TO_DATE (\'"+ startingDate +"\',\'YYYY-MM-DD HH24:MI\') AND " +
"TO_DATE (\'"+ endingDate +"\',\'YYYY-MM-DD HH24:MI\')";
Is it possible to retrieve only for some particular months (01-02-03-10-11-12) for example?
This where clause filters for a date between two dates and also includes a filter on month. Replace some_date with your date column.
where some_date between to_date('01/01/2015','MM/DD/YYYY') and to_date('12/31/2015','MM/DD/YYYY')
and extract(month from some_date) in (1, 2, 3, 10, 11, 12);