Multiindex Value - pandas

I've have the following Multiindex and I am trying to get the last time entry for a slice of the MultiIndex.
df.loc['AUDCAD'][-1]
would return 2019-04-30 00:00:00
and
df.loc['USDCHF'][-1]
would return 2021-03-05 23:55:00
open high low close
AUDCAD 2018-12-31 00:00:00 0.95708 0.96276 0.95649 0.95979
2019-01-31 00:00:00 0.96039 0.96309 0.92200 0.94895
2019-02-28 00:00:00 0.94849 0.95800 0.93185 0.93655
2019-03-31 00:00:00 0.93718 0.95632 0.93160 0.94745
2019-04-30 00:00:00 0.94998 0.96147 0.94150 0.94750
USDCHF 2021-03-05 23:35:00 0.93109 0.93119 0.93108 0.93116
2021-03-05 23:40:00 0.93116 0.93150 0.93116 0.93143
2021-03-05 23:45:00 0.93143 0.93147 0.93127 0.93128
2021-03-05 23:50:00 0.93129 0.93134 0.93117 0.93126
2021-03-05 23:55:00 0.93126 0.93141 0.93114 0.93118```

I guess you're looking for :
df.loc[block_name].index[-1]

Related

Postgres generate_series how to exclude last day when hour is 00:00

I need to generate a series of days in postgresql that would produce different result depending on the hours in the timestamp.
My series generation works fine when the time is not midnight.
For time range 2023-01-06 10:00:00+00 - 2023-02-03 10:00:00+00 I get a list of days where the first element is 2023-01-06 and the last is 2023-02-03. This works as expected:
generate_series('2023-01-06 10:00:00+00'::date, '2023-02-03 10:00:00+00'::date, '1 day')
However, for time range 2023-01-06 00:00:00+00 - 2023-02-03 00:00:00+00 I would like to get a list of days where the first element is 2023-01-06 and the last is 2023-02-02 as effectively 2023-02-03 hasn't started. That series still gives me an output that includes 2023-02-03, which is not what I want:
generate_series('2023-01-06 00:00:00+00'::date, '2023-02-03 00:00:00+00'::date, '1 day')
Is that possible to achieve in postgres?
you could check if ot os midnight and then subtract 1 Minute or 1 second from the end date
SELECt *
FROM generate_series('2023-01-06 00:00:00+00'::date,
(CASE WHEN to_char('2023-02-03 00:00:00+00'::date, 'HH24:MI:SS') = '00:00:00' THEN
'2023-02-03 00:00:00+00'::date - interval '1 Minute'
ELSE '2023-02-03 00:00:00+00'::date END) , '1 day')
generate_series
2023-01-06 00:00:00
2023-01-07 00:00:00
2023-01-08 00:00:00
2023-01-09 00:00:00
2023-01-10 00:00:00
2023-01-11 00:00:00
2023-01-12 00:00:00
2023-01-13 00:00:00
2023-01-14 00:00:00
2023-01-15 00:00:00
2023-01-16 00:00:00
2023-01-17 00:00:00
2023-01-18 00:00:00
2023-01-19 00:00:00
2023-01-20 00:00:00
2023-01-21 00:00:00
2023-01-22 00:00:00
2023-01-23 00:00:00
2023-01-24 00:00:00
2023-01-25 00:00:00
2023-01-26 00:00:00
2023-01-27 00:00:00
2023-01-28 00:00:00
2023-01-29 00:00:00
2023-01-30 00:00:00
2023-01-31 00:00:00
2023-02-01 00:00:00
2023-02-02 00:00:00
SELECT 28
fiddle

Pandas: merge two time series and get the mean values during the period when these two have overlapped time period

I got two pandas dataframes as following:
ts1
Out[50]:
soil_moisture_ids41
date_time
2007-01-07 05:00:00 0.1830
2007-01-07 06:00:00 0.1825
2007-01-07 07:00:00 0.1825
2007-01-07 08:00:00 0.1825
2007-01-07 09:00:00 0.1825
... ...
2017-10-10 20:00:00 0.0650
2017-10-10 21:00:00 0.0650
2017-10-10 22:00:00 0.0650
2017-10-10 23:00:00 0.0650
2017-10-11 00:00:00 0.0650
[94316 rows x 3 columns]
and the other one is
ts2
Out[51]:
soil_moisture_ids42
date_time
2016-07-20 00:00:00 0.147
2016-07-20 01:00:00 0.148
2016-07-20 02:00:00 0.149
2016-07-20 03:00:00 0.150
2016-07-20 04:00:00 0.152
... ...
2019-12-31 19:00:00 0.216
2019-12-31 20:00:00 0.216
2019-12-31 21:00:00 0.215
2019-12-31 22:00:00 0.215
2019-12-31 23:00:00 0.215
[30240 rows x 3 columns]
You could see that, from 2007-01-07 to 2016-07-19, only ts1 has the data points. And from 2016-07-20 to 2017-10-11 there are some overlapped time series. Now I want to combine these two data frames. During the overlapped period, I want to get the mean values over ts1 and ts2. During the non-overlapped period, (2007-01-07 to 2016-07-19 and 2017-10-12 to 2019-12-31), the values at each time stamp is set as the value from ts1 or ts2. So how can I do it?
Thanks!
Use concat with aggregate mean, if only one value get same ouput, if multiple get mean. Also finally DatatimeIndex is sorted:
s = pd.concat([ts1, ts2]).groupby(level=0).mean()
Just store the concatenated series first and then apply the mean. i.e. merged_ts = pd.concat([ts1, ts2]) and then mean_ts = merged_ts.group_by(level=0).mean()

Aggregate time from 15-minute interval to single hour in SQL

I have below table structure in SQL Server:
StartDate Start End Sales
==============================================
2020-08-25 00:00:00 00:15:00 291.4200
2020-08-25 00:15:00 00:30:00 401.1700
2020-08-25 00:30:00 00:45:00 308.3300
2020-08-25 00:45:00 01:00:00 518.3200
2020-08-25 01:00:00 01:15:00 247.3700
2020-08-25 01:15:00 01:30:00 115.4700
2020-08-25 01:30:00 01:45:00 342.3800
2020-08-25 01:45:00 02:00:00 233.0900
2020-08-25 02:00:00 02:15:00 303.3400
2020-08-25 02:15:00 02:30:00 11.9000
2020-08-25 02:30:00 02:45:00 115.2400
2020-08-25 02:45:00 03:00:00 199.5200
2020-08-25 06:00:00 06:15:00 0.0000
2020-08-25 06:15:00 06:30:00 45.2400
2020-08-25 06:30:00 06:45:00 30.4800
2020-08-25 06:45:00 07:00:00 0.0000
2020-08-25 07:00:00 07:15:00 0.0000
2020-08-25 07:15:00 07:30:00 69.2800
Is there a way to group above data into one hour interval instead of 15 minute interval?
It has to be based on start and end columns.
Thanks,
Maybe something like the following using datepart?
select startdate, DatePart(hour,start) [Hour], Sum(sales) SalesPerHour
from t
group by startdate, DatePart(hour,start)

Pandas DateTime Calculating Daily Averages

I have 2 columns of data in a pandas DF that looks like this with the "DateTime" column in format YYYY-MM-DD HH:MM:SS - this is first 24 hrs but the df is for one full year or 8784 x 2.
BAFFIN BAY DateTime
8759 8.112838 2016-01-01 00:00:00
8760 7.977169 2016-01-01 01:00:00
8761 8.420204 2016-01-01 02:00:00
8762 9.515370 2016-01-01 03:00:00
8763 9.222840 2016-01-01 04:00:00
8764 8.872423 2016-01-01 05:00:00
8765 8.776145 2016-01-01 06:00:00
8766 9.030668 2016-01-01 07:00:00
8767 8.394983 2016-01-01 08:00:00
8768 8.092915 2016-01-01 09:00:00
8769 8.946967 2016-01-01 10:00:00
8770 9.620883 2016-01-01 11:00:00
8771 9.535951 2016-01-01 12:00:00
8772 8.861761 2016-01-01 13:00:00
8773 9.077692 2016-01-01 14:00:00
8774 9.116074 2016-01-01 15:00:00
8775 8.724343 2016-01-01 16:00:00
8776 8.916940 2016-01-01 17:00:00
8777 8.920438 2016-01-01 18:00:00
8778 8.926278 2016-01-01 19:00:00
8779 8.817666 2016-01-01 20:00:00
8780 8.704014 2016-01-01 21:00:00
8781 8.496358 2016-01-01 22:00:00
8782 8.434297 2016-01-01 23:00:00
I am trying to calculate daily averages of the "BAFFIN BAY" and I've tried these approaches:
davg_df2 = df2.groupby(pd.Grouper(freq='D', key='DateTime')).mean()
davg_df2 = df2.groupby(pd.Grouper(freq='1D', key='DateTime')).mean()
davg_df2 = df2.groupby(by=df2['DateTime'].dt.date).mean()
All of these approaches yields the same answer as shown below :
BAFFIN BAY
DateTime
2016-01-01 6.008044
However, if you do the math, the correct average for 2016-01-01 is 8.813134 Thank you kindly for your help. I'm assuming the grouping is just by day or 24hrs to make consecutive DAILY averages but the 3 approaches above clearly is looking at other data in my 8784 x 2 DF.
I just ran your df with this code and i get 8.813134:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.groupby(by=pd.Grouper(freq='D', key='DateTime')).mean()
print(df)
Output:
BAFFIN BAY
DateTime
2016-01-01 8.813134

Select data between night and day hours

My data looks like this, it is a minute based data for 2 years.
2017-04-02 00:00:00
2017-04-02 00:01:00
2017-04-02 00:02:00
2017-04-02 00:03:00
2017-04-02 00:04:00
....
2017-04-02 23:59:00
...
2019-02-01 22:54:00
2019-02-01 22:55:00
2019-02-01 22:56:00
2019-02-01 22:57:00
2019-02-01 22:58:00
2019-02-01 22:59:00
2019-02-01 23:00:00
I want to access all the data rows between the end of the workday to the beginning of the next. Example between 2018-04-02 18:00:00 2018-04-03 05:00:00 for all the days in my data frame. Please help
If you use a DatetimeIndex then you can use .between_time
import pandas as pd
df = pd.DataFrame({'date': pd.date_range('2017-04-02', freq='90min', periods=100)})
df = df.set_index('date')
df.between_time('18:00', '5:00')
#date
#2017-04-02 00:00:00
#2017-04-02 01:30:00
#2017-04-02 03:00:00
#2017-04-02 04:30:00
#2017-04-02 18:00:00
#2017-04-02 19:30:00
#2017-04-02 21:00:00
#2017-04-02 22:30:00
#....
One approach is boolean indexing based on conditions on the datetime column or index. Assuming your DataFrame is named df and it has a DatetimeIndex equal to the example data you've posted, try this:
df[(df.index.hour >= 18) | (df.index.hour <= 5)]