Difference between 2 dates which are objects in dataframe - pandas

Dataframe has 2 dates which are of "object" datatype. StartDate and EndDate are in the mm/dd/yyyy format.
Name StartDate EndDate
bou1 1/9/2017 1/10/2017
bou2 12/31/2016 1/10/2017
Output:
Name StartDate EndDate Diff
bou1 1/9/2017 1/10/2017 1
bou2 12/31/2016 1/10/2017 10
Any suggestions would be appreciated !!

you first need to convert to datetime for those columns and then subtract.
try
df['startDate'] = pd.to_datetime(df['startDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
df['difInDate'] = (abs(df['startDate'].sub(df['EndDate'], axis = 0))) / np.timedelta64(1, 'D')
print(df['difInDate'])
abs is just to make it days positive because, you are subtracting from small date to big date
alternatively you can use (df['EndDate'].sub(df['StartDate'] too

# Recreating your dataframe with dates stored as strings
df = pd.DataFrame({'Name' : ['bou1', 'bou2'],
'StartDate': ['01/09/2017','12/31/2016'],
'EndDate' : ['01/10/2017', '01/10/2017']})
# Date strings converted with pd.Datetime
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# .dt handles your calculation and .days outputs in days
df['Diff'] = (df['EndDate'] - df['StartDate']).dt.days
# Just prints the columns in your order
df[['Name', 'StartDate', 'EndDate', 'Diff']]

Related

Pandas: Read dates in column which has different formats

I would like to read a csv, with dates in a column, but the dates are in different formats within the column.
Specifically, some dates are in "dd/mm/yyyy" format, and some are in "4####" format (excel 1900 date system, serial number represents days elapsed since 1900/1/1).
Is there any way to use read_csv or pandas.to_datetime to convert the column to datetime?
Have tried using pandas.to_datetime with no parameters to no avail.
df["Date"] = pd.to_datetime(df["Date"])
Returns
ValueError: year 42613 is out of range
Presumably it can read the "dd/mm/yyyy" format fine but produces an error for the "4####" format.
Note: the column is mixed type as well
Appreciate any help
Example
dates = ['25/07/2016', '42315']
df = DataFrame (dates, columns=['Date'])
#desired output ['25/07/2016', '07/11/2015']
Let's try:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
m = dates.isna()
dates.loc[m] = (
pd.TimedeltaIndex(df.loc[m, 'Date'].astype(int), unit='d')
+ pd.Timestamp(year=1899, month=12, day=30)
)
df['Date'] = dates
Or alternatively with seconds conversion:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
m = dates.isna()
dates.loc[m] = pd.to_datetime(
(df.loc[m, 'Date'].astype(int) - 25569) * 86400.0,
unit='s'
)
df['Date'] = dates
df:
Date
0 2016-07-25
1 2015-11-07
Explanation:
First convert to datetime all that can be done with pd.to_datetime:
dates = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
Check which values couldn't be converted:
m = dates.isna()
Convert NaTs
a. Offset as days since 1899-12-30 using TimedeltaIndex + pd.Timestamp:
dates.loc[m] = (
pd.TimedeltaIndex(df.loc[m, 'Date'].astype(int), unit='d')
+ pd.Timestamp(year=1899, month=12, day=30)
)
b. Or convert serial days to seconds mathematically:
dates.loc[m] = pd.to_datetime(
(df.loc[m, 'Date'].astype(int) - 25569) * 86400.0,
unit='s'
)
Update the Date column:
df['Date'] = dates

remove few dates from date range

from datetime import timedelta, date
def daterange(date1, date2):
for n in range(int ((date2 - date1).days)+1):
yield date1 + timedelta(n)
start_dt = date(2015, 12, 20)
end_dt = date(2016, 1, 11)
for dt in daterange(start_dt, end_dt):
print(dt.strftime("%Y-%m-%d"))
I have date range as above stated, but I have few dates from this date range to ignore. These dates are in dataframe.
How can I take these dates out from this date range? Anyone please suggest. Dataframe with distinct dates are below.
Pardata = spark.read.parquet("/mnt/Test/data.parquet")
Pardata.createOrReplaceTempView("parfile")
ParRes = spark.sql("SELECT distinct date FROM parfile ")
Use left_anti join:
dates = [[dt.strftime("%Y-%m-%d")] for dt in daterange(start_dt, end_dt)]
dates_df = spark.createDataFrame(dates, ["date"])
dates_df.join(ParRes, dates_df("date") === ParRes("date"), "left_anti").show()
First, create a DataFrame dates_df from that range of dates. Then use left_anti join, which filters out dates from ParRes Dataframe in the dates_df Dataframe according to the key date.

REDIS sorted set sort time into two types

REDIS 4.0.8
I would like to sort the following data by createDate and endDate:
info.item:*
field : createDate , endDate , Name
createDate is now time to NumericTime
endDate is A randomly set date at after now
Name is anything
and I set info.item:* in item_List:
zadd item_List endDate info.item:*
if createDate is 2018-03-06 to Numeric of info.item:1
endDate is 2018-03-07 to Numeric
(Next time , Skip 'to Numeric')
createDate is 2018-03-08 of info.item:2
endDate is 2018-03-12
createDate is 2018-03-09 of info.item:3
endDate is 2018-03-10
createDate is 2018-03-10 of info.item:4
endDate is 2018-03-22
when using zrangebyscore
zrangebyscore endtime_Bucket 2018-03-08 +inf
i got
info.item:3
info.item:2
info.item:4
the result is correct.
Additionally, I want to sort by createDate when the endDate is later than now
I expect this result:
info.item:4
info.item:3
info.item:2
but failed.
I tried sort commands:
sort item_List by *->createDate desc
result:
info.item:4
info.item:3
info.item:2
info.item:1
How can I exclude items when the endDate is older than now and sort by createDate?
now is 2018-03-08
Redis sorted set score is 64 bit float, and supports 53 bit of integer range.
-(2^53) to +(2^53) ( both inclusive ) Or -9007199254740992 and 9007199254740992
This allows as to mask both createDate & endDate in the score.
A performance optimized approach is to have all bits of the score set to zero, and use left 25 bits for createDate, and right 25 bits for endDate of the 53 integer bits. In this case both createDate & endDate would be unixtimestamp of the beginning of the date.
A simpler approach would be to use string concatenation of dates and their conversion to numbers.
Example:
>>> endDate = "20180308"
>>> createDate = "20180305"
>>> endDate+createDate
'2018030820180305'
>>> int(endDate+createDate) < 9007199254740992
True
Both createDate & endDates appended together and converted to number are smaller than the integer value redis sorted set score, and we can use this to our advantage.
To find items greater than endDate older than now, i.e. todays date, you can use ZRANGEBYSCORE with score min = 2018030800000000 ( you can also use a max score to have a endDate range instead of just older than)
. This result would be sorted by right part of the score, which is createDate represented as an integer.
This approach will only work for YYYY/MM/DD or DD/MM/YYYY format of dates, as their string based lexographical sort & numeric represenation sort yield the same result. This will fail for American date formats like MM/DD/YYYY.

Convert and Operation Not working as expected. Ignoring Year value

I have the following code:
CONVERT(VARCHAR(20), TimeCard_Date, 101) <
CONVERT(VARCHAR(20), dateadd(dd,-3,getdate()), 101)
The Original TimeCard_Date value = 2018-06-01
The GetDate() return = 11/14/2017
Can anyone assist as to why it thinks the Timecard_Date value set for June 2018 is less than the GetDate() minus 3 days value?
When you convert, it converts to a varchar datatype. 06/01/2018 is less than 11/14/2017 as a varchar since it is an alphabetical (or by number?) comparison. If you compare by date, the comparison is by the date datatype, which is as you expect.
You can change your code to:
TimeCard_Date < dateadd(dd,-3,getdate())
You don't need to convert DATETIME to VARCHAR in order to compare dates. Just use:
TimeCard_Date < DATEADD(dd,-3,GETDATE())
On the other hand, if you ever have to convert them to do it, you have to standarize the format (yyyyMMdd). You can check the FORMAT function https://learn.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql

Date subtraction error

I have a SQL Server table with a few columns.
One of those columns is a date and another is No of Nights.
Number of nights is always a two character varchar column with values like 1N, 2N, 3N etc depending on the number of nights up to 7N.
I want to subtract the 1 part of the 1N column from the date.
For ex: 25Oct15 - 1N = 24Oct15
Obviously I will be replacing the '1N' with the actual column name. I tried doing a trim as:
date - left(no of nights, 1)
But I get an error
Conversion failed when converting the varchar value '25Oct16' to data type int.
Sample date below
Date | NoofNIghts | Result
2016-04-26 00:00:00.000 | 1N |
2016-04-28 00:00:00.000 | 3N |
Where the result column would be the subtracted value. Any help would be great. Thanks.
SELECT DATEADD ( DAY, - CONVERT(INT, REPLACE(NoofNights, 'N', '')), getdate() ) as Result
Try this
DECLARE #V_Date DATETIME = '2016-04-26 00:00:00.000'
,#V_NoofNIghts VARCHAR(2) = '1N'
SELECT DATEADD(DAY, CAST(LEFT(#V_NoofNIghts,1) AS INT) *-1 ,#V_Date)
Well basic query should be like
Update tablename
set result= DATEADD(d, -CAST(LEFT(NoofNIghts, LEN(NoofNIghts)-1) AS INT),Date)