I have a DataFrame with monthly observations (var1, var2) for a group (Area)
date var1 var2 Area
2008-03-01 2 22 OH
2008-02-01 3 33 OH
2008-01-01 4 44 OH
... etc
2008-03-01 111 1111 AK
2008-02-01 222 2222 AK
2008-01-01 333 3333 AK
I wish to 'downsample' these variables to quarterly data by taking the 3-month mean. I.e. The first quarterly observation (var1) for 'OH' should be (1+3+4)/3.
How do I do this in pandas? Thank you
EDIT: Here is what I intended the output to be:
dateQtr var1 var2 Area
2008-Q1 3 33 OH
2007-Q4 ... ... OH
... etc
2008-Q1 222 2222 AK
If you set the index to 'date' then you can resample quarterly:
In [114]:
df.resample('q')
Out[114]:
var1 var2
date
2008-03-31 112.5 1127.5
So on your existing df:
In [116]:
df.set_index('date').resample('q', how='mean')
Out[116]:
var1 var2
date
2008-03-31 112.5 1127.5
EDIT
Thanks to #JohnE for pointing this out:
In [134]:
df.groupby('Area')[['var1','var2']].resample('q').reset_index()
Out[134]:
Area date var1 var2
0 AK 2008-03-31 222 2222
1 OH 2008-03-31 3 33
Related
I've got a datatable like that :
id
line
datedt
123
1
01/01/2021
123
2
01/01/2021
123
3
01/01/2021
777
1
13/04/2020
777
2
13/04/2020
123
1
12/04/2021
123
2
12/04/2021
888
1
01/07/2020
888
2
01/07/2020
452
1
05/01/2020
888
1
02/05/2021
888
2
02/05/2021
I'd like to obtain a result like that, ie : the nb of same id with differents dates.
Example we can find 123 with 2 diffents dates, 888 too, but only date for 777 and 452
id
nb
123
2
777
1
888
2
452
1
How could I get that ?
Hope it's clear :)
Thanks you
select id , count(distinct datedt) as nb
from table
group by id
I have this df:
ID Date X Y
A 16-07-19 123 56
A 17-07-19 456 84
A 18-07-19 0 58
A 19-07-19 123 81
B 19-07-19 456 70
B 21-07-19 789 46
B 22-07-19 0 19
B 23-07-19 0 91
C 14-07-19 0 86
C 16-07-19 456 91
C 17-07-19 456 86
C 18-07-19 0 41
C 19-07-19 456 26
C 20-07-19 456 17
D 06-07-19 789 98
D 08-07-19 789 90
D 09-07-19 0 94
I want to exclude IDs that have any value in X column (except for 0) day after day.
For example: A has the value 123 on 16-07-19, and 456 on 17-07-19. So all A's observations should be excluded.
Expected result:
ID Date X Y
B 19-07-19 456 70
B 21-07-19 789 46
B 22-07-19 0 19
B 23-07-19 0 91
D 06-07-19 789 98
D 08-07-19 789 90
D 09-07-19 0 94
Let's do this in a vectorized manner, to keep our code as efficient as possible
(meaning: we avoid using GroupBy.apply)
First we check if the difference in Date is equal to 1 day
We check if X column is not equal to 0
we create a temporary column m where we check if both conditions are True
We groupby on ID and remove all groups where any of the rows are True
# df['Date'] = pd.to_datetime(df['Date']) <- if Date is not datetime type
m1 = df['Date'].diff(1).eq(pd.Timedelta(1, unit='d'))
m2 = df['X'].ne(0)
df['m'] = m1&m2
df = df[~df.groupby('ID')['m'].transform('any')].drop(columns='m')
ID Date X Y
4 B 2019-07-19 456 70
5 B 2019-07-21 789 46
6 B 2019-07-22 0 19
7 B 2019-07-23 0 91
14 D 2019-06-07 789 98
15 D 2019-08-07 789 90
16 D 2019-09-07 0 94
I would like to add some data (event_date) from table B to table A, as described below. It looks like a join on event_id, however this column contains duplicate values in both tables. There are more columns in both tables but I'm omitting them for clarity.
How to achieve the desired effect in Pandas and in SQL in the most direct way?
Table A:
id,event_id
1,123
2,123
3,456
4,456
5,456
Table B:
id,event_id,event_date
11,123,2017-02-06
12,456,2017-02-07
13,123,2017-02-06
14,456,2017-02-07
15,123,2017-02-06
16,123,2017-02-06
Desired outcome (table A + event_date):
id,event_id,event_date
1,123,2017-02-06
2,123,2017-02-06
3,456,2017-02-07
4,456,2017-02-07
5,456,2017-02-07
Using merge, first drop duplicates from B
In [662]: A.merge(B[['event_id', 'event_date']].drop_duplicates())
Out[662]:
id event_id event_date
0 1 123 2017-02-06
1 2 123 2017-02-06
2 3 456 2017-02-07
3 4 456 2017-02-07
4 5 456 2017-02-07
SQL part:
select distinct a.*, b.event_date
from table_a a
join table_b b
on a.event_id = b.event_id;
You can use Pandas Merge to get the desired result. Finally get only the columns that you are interested from DataFrame
df_Final = pd.merge(df1,df2,on='event_id',how='left')
print df_Final[['id_y','event_id','event_date']]
output
id_y event_id event_date
0 1 123 2017-02-06
1 2 123 2017-02-06
2 3 456 2017-02-07
3 4 456 2017-02-07
4 5 456 2017-02-07
5 1 123 2017-02-06
6 2 123 2017-02-06
7 3 456 2017-02-07
8 4 456 2017-02-07
9 5 456 2017-02-07
10 1 123 2017-02-06
11 2 123 2017-02-06
12 1 123 2017-02-06
13 2 123 2017-02-06
I'm working on following scenario in SAS.
Input 1
AccountNumber Loans
123 abc, def, ghi
456 jkl, mnopqr, stuv
789 w, xyz
Output 1
AccountNumbers Loans
123 abc
123 def
123 ghi
456 jkl
456 mnopqr
456 stuv
789 w
789 xyz
Input 2
AccountNumbers Loans
123 15-abc
123 15-def
123 15-ghi
456 99-jkl
456 99-mnopqr
456 99-stuv
789 77-w
789 77-xyz
Output 2
AccountNumber Loans
123 15-abc, 15-def, 15-ghi
456 99-jkl, 99-mnopqr, 99-stuv
789 77-w, 77-xyz
I manged to get Input 2 from output 1, just need Output 2 now.
I will really appreciate the help.
Thanks!
Try this, replacing [Input 2] with the actual name of your Input 2 table.
data output2 (drop=loans);
do until (last.accountnumbers);
set [Input 2];
by accountnumbers;
length loans_combined $100;
loans_combined=catx(', ',loans_combined,loans);
end;
run;
i am really struggling with this pivot and hoped reaching out for help and enlightenment might help.
Say i have the following table....
Table A
type actId date rowSort order value value_char colName
------------------------------------------------------------------------------------
checking 1003 2011-12-31 2 1 44 44 Amount
checking 1003 2011-12-31 2 2 55 55 Interest
checking 1003 2011-12-31 2 3 66 66 Change
checking 1003 2011-12-31 2 4 77 77 Target
checking 1003 2011-12-31 2 5 88 88 Spread
savings 23456 2011-12-31 1 1 999 999 Amount
savings 23456 2011-12-31 1 2 888 888 Interest
savings 23456 2011-12-31 1 3 777 777 Change
savings 23456 2011-12-31 1 4 666 666 Target
savings 23456 2011-12-31 1 5 555 555 Spread
And i want to transpose to table b
checking chkId date rowSort order chkvalue chkValchar colName savings savId savVal savValChar
-------------------------------------------------------------------------------------------------------------------
checking 1003 2011-12-31 2 1 44 44 Amount savings 23456 999 999
checking 1003 2011-12-31 2 2 55 55 Interest savings 23456 888 888
checking 1003 2011-12-31 2 3 66 66 Change savings 23456 777 777
checking 1003 2011-12-31 2 4 77 77 Target savings 23456 666 666
checking 1003 2011-12-31 2 5 88 88 Spread savings 23456 555 555
I can admit this is beyond my skills at the moment.
I believe i need to do a pivot on this table, using the rowSort (identify savings vs checking) along with ordering using the order column. This maybe wrong and that is why i am here.
Is a pivot the right way to go? Am i right to assume my pivot is to use the aggregate max(rowSort)?
Assuming rowSort from `checking equal to rowSort+1 from savings and the rows link though field value, this should do it:
SELECT DISTINCT
a.type as checking,
a.actId as chkId,
a.date,
a.rowSort+1,
a.order,
a.value as chkvalue,
a.value_char as chkValchar,
a.colName,
b.type as 'savings',
a.actId as savId,
b.value as savVal,
b.value_char as savValChar
FROM tablea a
INNER JOIN tablea b ON b.rowSort = a.rowSort+1 and b.value = a.value
Based on the requirements you presented, you will not use a PIVOT for this query, you will want to JOIN your table to itself. The query below should give you the records that you want without having to use a DISTINCT
select c.type as checking
, c.actId as chkid
, c.date
, c.rowsort
, c.[order]
, c.value as chkvalue
, c.value_char as chkValchar
, c.colName
, s.type as savings
, s.actId as savId
, s.value as savVal
, s.value_char as savValchar
from t1 c
inner join t1 s
on c.rowsort = s.rowsort + 1
and c.[order] = s.[order]
See SQL Fiddle with Demo