remove few dates from date range - pandas

from datetime import timedelta, date
def daterange(date1, date2):
for n in range(int ((date2 - date1).days)+1):
yield date1 + timedelta(n)
start_dt = date(2015, 12, 20)
end_dt = date(2016, 1, 11)
for dt in daterange(start_dt, end_dt):
print(dt.strftime("%Y-%m-%d"))
I have date range as above stated, but I have few dates from this date range to ignore. These dates are in dataframe.
How can I take these dates out from this date range? Anyone please suggest. Dataframe with distinct dates are below.
Pardata = spark.read.parquet("/mnt/Test/data.parquet")
Pardata.createOrReplaceTempView("parfile")
ParRes = spark.sql("SELECT distinct date FROM parfile ")

Use left_anti join:
dates = [[dt.strftime("%Y-%m-%d")] for dt in daterange(start_dt, end_dt)]
dates_df = spark.createDataFrame(dates, ["date"])
dates_df.join(ParRes, dates_df("date") === ParRes("date"), "left_anti").show()
First, create a DataFrame dates_df from that range of dates. Then use left_anti join, which filters out dates from ParRes Dataframe in the dates_df Dataframe according to the key date.

Related

list comprehension vs lambda function in pandas dataframe

I'm trying to convert decimal years to datetime format in Python. I've managed to make the conversion using a list comprehension, but I cannot get a lambda function working to do the same thing. What am I doing wrong? How can I use a lambda function to make this conversion?
from datetime import datetime, timedelta
import calendar
df = pd.DataFrame(data = [2021.3, 2021.6], columns = ['dec_date'])
# define a function to convert decimal dates to datetime
def convert_partial_year(number):
# round down to get the year
year = int(number)
# get the fractional year
year_fraction = number - year
# get the number of days in the given year
days_in_year = (365 + calendar.isleap(year))
# convert the fractional year into days
d = timedelta(days=year_fraction * days_in_year)
# convert the year into a date format
day_one = datetime(year, 1, 1)
# add the days into the year onto the date format
date = d + day_one
# return the result
return date
# my lambda function does not work
df.assign(
date = lambda x: convert_partial_year(x.dec_date)
)
# my list comprehension does work
df.assign(
date = [convert_partial_year(x) for x in df.dec_date]
)

Pandas: Read dates in column which has different formats

I would like to read a csv, with dates in a column, but the dates are in different formats within the column.
Specifically, some dates are in "dd/mm/yyyy" format, and some are in "4####" format (excel 1900 date system, serial number represents days elapsed since 1900/1/1).
Is there any way to use read_csv or pandas.to_datetime to convert the column to datetime?
Have tried using pandas.to_datetime with no parameters to no avail.
df["Date"] = pd.to_datetime(df["Date"])
Returns
ValueError: year 42613 is out of range
Presumably it can read the "dd/mm/yyyy" format fine but produces an error for the "4####" format.
Note: the column is mixed type as well
Appreciate any help
Example
dates = ['25/07/2016', '42315']
df = DataFrame (dates, columns=['Date'])
#desired output ['25/07/2016', '07/11/2015']
Let's try:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
m = dates.isna()
dates.loc[m] = (
pd.TimedeltaIndex(df.loc[m, 'Date'].astype(int), unit='d')
+ pd.Timestamp(year=1899, month=12, day=30)
)
df['Date'] = dates
Or alternatively with seconds conversion:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
m = dates.isna()
dates.loc[m] = pd.to_datetime(
(df.loc[m, 'Date'].astype(int) - 25569) * 86400.0,
unit='s'
)
df['Date'] = dates
df:
Date
0 2016-07-25
1 2015-11-07
Explanation:
First convert to datetime all that can be done with pd.to_datetime:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
Check which values couldn't be converted:
m = dates.isna()
Convert NaTs
a. Offset as days since 1899-12-30 using TimedeltaIndex + pd.Timestamp:
dates.loc[m] = (
pd.TimedeltaIndex(df.loc[m, 'Date'].astype(int), unit='d')
+ pd.Timestamp(year=1899, month=12, day=30)
)
b. Or convert serial days to seconds mathematically:
dates.loc[m] = pd.to_datetime(
(df.loc[m, 'Date'].astype(int) - 25569) * 86400.0,
unit='s'
)
Update the Date column:
df['Date'] = dates

Difference between 2 dates which are objects in dataframe

Dataframe has 2 dates which are of "object" datatype. StartDate and EndDate are in the mm/dd/yyyy format.
Name StartDate EndDate
bou1 1/9/2017 1/10/2017
bou2 12/31/2016 1/10/2017
Output:
Name StartDate EndDate Diff
bou1 1/9/2017 1/10/2017 1
bou2 12/31/2016 1/10/2017 10
Any suggestions would be appreciated !!
you first need to convert to datetime for those columns and then subtract.
try
df['startDate'] = pd.to_datetime(df['startDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
df['difInDate'] = (abs(df['startDate'].sub(df['EndDate'], axis = 0))) / np.timedelta64(1, 'D')
print(df['difInDate'])
abs is just to make it days positive because, you are subtracting from small date to big date
alternatively you can use (df['EndDate'].sub(df['StartDate'] too
# Recreating your dataframe with dates stored as strings
df = pd.DataFrame({'Name' : ['bou1', 'bou2'],
'StartDate': ['01/09/2017','12/31/2016'],
'EndDate' : ['01/10/2017', '01/10/2017']})
# Date strings converted with pd.Datetime
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# .dt handles your calculation and .days outputs in days
df['Diff'] = (df['EndDate'] - df['StartDate']).dt.days
# Just prints the columns in your order
df[['Name', 'StartDate', 'EndDate', 'Diff']]

Query to retreive data having Data = '1753/01/01'

I want to retrieve those records which have Data <= paramdate and also those records which have Date = '1753/01/01'
Thanks
you could use DATEDIFF
DATEDIFF will return different time, day, month, pr year, based what you want to compare.
Example :
`DATEDIFF (DAY, '01/01/2017', '02/01/2017')`
This will return 1, as the comparation is the day.
Note : datediff also could return negative value, as the position which is higher. Example :
'DATEDIFF (DAY, '02/01/2017', '01/01/2017')'
will return -1
When you want to get the data that the date and the format are specified , you could use CONVERT to convert it to your formatted date and compare it like usual
Example :
WHERE CONVERT(VARCHAR, '2017/01/01', 103) = '01/01/2017'

Oracle SQL Between Date, Except Some Month

I've got a query which basically get information concerning the last 2 years from the current Date.
DateTime jodaCurrentDate = new DateTime();
DateTimeFormatter formatter = DateTimeFormat.forPattern("YYYY-MM-dd HH:mm");
String startingDate = formatter.print(jodaCurrentDate.minusYears(2));
String endingDate = formatter.print(jodaCurrentDate);
"DATE BETWEEN TO_DATE (\'"+ startingDate +"\',\'YYYY-MM-DD HH24:MI\') AND " +
"TO_DATE (\'"+ endingDate +"\',\'YYYY-MM-DD HH24:MI\')";
Is it possible to retrieve only for some particular months (01-02-03-10-11-12) for example?
This where clause filters for a date between two dates and also includes a filter on month. Replace some_date with your date column.
where some_date between to_date('01/01/2015','MM/DD/YYYY') and to_date('12/31/2015','MM/DD/YYYY')
and extract(month from some_date) in (1, 2, 3, 10, 11, 12);