So I have a table with a date column (example below)
Date1
2020-11-08
2020-12-03
2020-11-21
I am trying to calculate the difference between the column and a specific date with the following code:
df['diff'] = pd.to_datetime('2020-12-31') - df['DELDATED']
I wanted to get a number of days difference, however I obtained the following:
diff
454579200000000000
2419200000000000
3456000000000000
Why am I getting this and how can I get what I anticipate?
Try Series.dt.days:
df['diff'] = (pd.to_datetime('2020-12-31') - df['DELDATED']).dt.days
Working same like Series.rsub for subtract from right side, but less clear in my opinion:
df['diff'] = df['DELDATED'].rsub(pd.to_datetime('2020-12-31')).dt.days
Related
I have data with a timestep of 1 min for almost a year and a half. I removed the timezone from the timestamp, now the format looks like this: "yyyy-MM-dd HH:mm:ss". Then I renamed the time column to "ds" and the target to "y" so that it would work with neural prophet. It gives me the following error: (Column ds has duplicate values. Please remove duplicates.) . And this is because of the time-change throughout the year, it's like there is a whole duplicated hour with 1 min resolution. It's a 60 data points, I thought of deleting them, but is there any other way around it?
Note: I used Prophet before and I didn't face this problem.
Thanks
You should not remove the timezone with replace(tzinfo=None) because you don't remove the duplicate values due to DST. You have to use tz_convert:
Try:
df['Time'] = pd.to_datetime(df['Time']).dt.tz_convert(None)
Output:
# Before
>>> df
Time
0 2021-11-07 01:47:00-07:00
# After
>>> df
Time
0 2021-11-07 08:47:00
In my dataset as follows (two columns: DATE and RATE)
I want to get the mean for the RATE for each day (from the dataset, you can see that there are multiple rate values for the same day). I have about 1,000 rows, so that I am trying to find an easier way to calculate the mean for each day, then save the results to a data frame.
You have to group by date then aggregate
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html
In your case
df.groupby('DATE').agg({'RATE': ['mean']})
You can groupby the date and perform mean operation.
new_df = df.groupby('DATE').mean()
I have a table with a creation date and an action date. I'd like to get the number number of minutes between the two dates. I looked at the docs and I'm having trouble finding a solution.
%sql
SELECT datediff(creation_dt, actions_dt)
FROM actions
limit 10
This gives me the number of days between the two dates. One record looks like
2019-07-31 23:55:22.0 | 2019-07-31 23:55:21 | 0
How can I get the number of minutes?
As stated in the comments, if you are using Spark or Pyspark then the withColumn method is best.
BUT
If you are using the SparkSQL environment then you could use the unix_timestamp() function to get what you need
select ((unix_timestamp('2019-09-09','yyyy-MM-dd') - unix_timestamp('2018-09-09','yyyy-MM-dd'))/60);
Swap the dates with your column names and define what your date pattern is as the parameters.
Both dates are converted into seconds and the difference is taken. We then divide by 60 to get the minutes.
525600.0
i use MS Access for my database.
I have 2 columns holding 2 different dates
example
column 1 = 8/7/2016
column 2 = 8/11/2016
i want to subtract 8/11/2016 to 8/7/2016 and get a whole number.
in this example i should be getting 4. What i intend to do is to use the result to compute for something.
Date(Time)s can directly be substracted resulting in a TimeSpan object which has Days or TotalDays properties
Dim result = (#11/08/2016# - #07/08/2016#).Days
Have a look at DateDiff
Days = DateDiff("d", #07/08/2016#, #11/08/2016#)
or in an access sql query:
DateDiff('d', [DateCol1], [DateCol2])
suppose I have a table MyTable with a column some_date (date type of course) and I want to select the newest 3 months data (or x days).
What is the best way to achieve this?
Please notice that the date should not be measured from today but rather from the date range in the table (which might be older then today)
I need to find the maximum date and compare it to each row - if the difference is less than x days, return it.
All of this should be done with sqlalchemy and without loading the entire table.
What is the best way of doing it? must I have a subquery to find the maximum date? How do I select last X days?
Any help is appreciated.
EDIT:
The following query works in Oracle but seems inefficient (is max calculated for each row?) and I don't think that it'll work for all dialects:
select * from my_table where (select max(some_date) from my_table) - some_date < 10
You can do this in a single query and without resorting to creating datediff.
Here is an example I used for getting everything in the past day:
one_day = timedelta(hours=24)
one_day_ago = datetime.now() - one_day
Message.query.filter(Message.created > one_day_ago).all()
You can adapt the timedelta to whatever time range you are interested in.
UPDATE
Upon re-reading your question it looks like I failed to take into account the fact that you want to compare two dates which are in the database rather than today's day. I'm pretty sure that this sort of behavior is going to be database specific. In Postgres, you can use straightforward arithmetic.
Operations with DATEs
1. The difference between two DATES is always an INTEGER, representing the number of DAYS difference
DATE '1999-12-30' - DATE '1999-12-11' = INTEGER 19
You may add or subtract an INTEGER to a DATE to produce another DATE
DATE '1999-12-11' + INTEGER 19 = DATE '1999-12-30'
You're probably using timestamps if you are storing dates in postgres. Doing math with timestamps produces an interval object. Sqlalachemy works with timedeltas as a representation of intervals. So you could do something like:
one_day = timedelta(hours=24)
Model.query.join(ModelB, Model.created - ModelB.created < interval)
I haven't tested this exactly, but I've done things like this and they have worked.
I ended up doing two selects - one to get the max date and another to get the data
using the datediff recipe from this thread I added a datediff function and using the query q = session.query(MyTable).filter(datediff(max_date, some_date) < 10)
I still don't think this is the best way, but untill someone proves me wrong, it will have to do...