using pandas for the first time and I've provided a smaller version of a data frame i've created like the following:
Date project1 project2 project3
0 12/10/2017 100 200 300
1 12/11/2017 0 100 100
2 12/12/2017 0 0 100
I need to do 2 cumsum for each seperate project where it stops at the zero. And another across all of the projects row wise. I keep struggling with either the date or just counting the zeros. Any advice would be appreciates.
So the output would like:
Date project1 project2 project3
0 12/10/2017 100 200 300
1 12/11/2017 0 300 400
2 12/12/2017 0 0 500
and
Date project1 project2 project3 project_sum
0 12/10/2017 100 200 300 600
1 12/11/2017 0 300 400 700
2 12/12/2017 0 0 500 500
For you 1st question , using cumsum and cumprod
df[['project1','project2','project3']].cumsum().mask(df[['project1','project2','project3']].cumprod().eq(0),0)
Out[86]:
project1 project2 project3
0 100 200 300
1 0 300 400
2 0 0 500
And then assign it back using sum(axis=1)
df[['project1','project2','project3']]=df[['project1','project2','project3']].cumsum().mask(df[['project1','project2','project3']].cumprod().eq(0),0)
df['projectSum']=df[['project1','project2','project3']].sum(1)
df
Out[89]:
Date project1 project2 project3 projectSum
0 12/10/2017 100 200 300 600
1 12/11/2017 0 300 400 700
2 12/12/2017 0 0 500 500
Related
I need to get result from Existing Table and my current result is like
Code Name Branch Dr Cr Net
1001 Closing Stock UP 150000 195000 -45000
1001 Closing Stock DL 159.74 0 159.74
1001 Closing Stock CH 0 24.37 -24.37
1002 IGST Payable UP 0 135.37 -135.37
1002 IGST Payable DL 0 200 -200
1002 IGST Payable CH 200 0 200
1003 Sundry Debtors UP 15767000 0 15767000
1003 Sundry Debtors DL 0 181716 -181716
Note :
If Branches Increase then Branch wise (Debit,Credit,Net) Fields should be Create automatically.
And Result should be like this
Code Name UPDr UPCr UPNet CHDr CHCr CHNet DLDr DLCr DLNet
1001 Stk 1500 1950 -450 0 24.37 -24.37 159 0 159.74
1002 IGST 0 135.37 -135 200 0 200 0 181 -181
1003 Sund 157 0 157 159 0 159 200 0 200
Please help for the same.
I have a data in time series format like:
date value
1-1-2013 100
1-2-2013 200
1-3-2013 300
1-4-2013 400
1-5-2013 500
1-6-2013 600
1-7-2013 700
1-8-2013 650
1-9-2013 450
1-10-2013 350
1-11-2013 250
1-12-2013 150
Use Series.pct_change:
In [458]: df['growth rate'] = df.value.pct_change()
In [459]: df
Out[459]:
date value growth rate
0 1-1-2013 100 NaN
1 1-2-2013 200 1.000000
2 1-3-2013 300 0.500000
3 1-4-2013 400 0.333333
4 1-5-2013 500 0.250000
5 1-6-2013 600 0.200000
6 1-7-2013 700 0.166667
7 1-8-2013 650 -0.071429
8 1-9-2013 450 -0.307692
9 1-10-2013 350 -0.222222
10 1-11-2013 250 -0.285714
11 1-12-2013 150 -0.400000
Or:
If you want to show in percent multiply by 100:
In [480]: df['growth rate'] = df.value.pct_change().mul(100)
In [481]: df
Out[481]:
date value growth rate
0 1-1-2013 100 NaN
1 1-2-2013 200 100.000000
2 1-3-2013 300 50.000000
3 1-4-2013 400 33.333333
4 1-5-2013 500 25.000000
5 1-6-2013 600 20.000000
6 1-7-2013 700 16.666667
7 1-8-2013 650 -7.142857
8 1-9-2013 450 -30.769231
9 1-10-2013 350 -22.222222
10 1-11-2013 250 -28.571429
11 1-12-2013 150 -40.000000
Growth rate as single number for each year
df['col'] = df.groupby(['Year'])['col2'].pct_change(periods=1) * 100
I have a table that looks like this
TIMECODE UNIT_CODE Department Account AMOUNT
20194 10 1000 1000 100
20194 10 2354 1100 150
20194 10 1000 1000 200
20194 10 2354 1000 100
20194 20 500 1000 250
20194 20 500 1100 200
How I need the results to be is like this
TIMECODE UNIT_CODE Department 1000 1100
20194 10 1000 300 NULL
20194 10 2354 100 150
20194 20 500 250 200
hopefully that gives you a better image, but basically I would need to do a SUM depending on the distinct value of the other columns. The accounts that were previously in rows would be changed into columns.
any ideas or help with this would be greatly appreciated
Try the following, here is the demo.
select
TIMECODE,
UNIT_CODE,
Department,
sum(case when Account = 1000 then AMOUNT end) as "1000",
sum(case when Account = 1100 then AMOUNT end) as "1100"
from myTable
group by
TIMECODE,
UNIT_CODE,
Department
Output:
---------------------------------------------------
| TIMECODE UNIT_CODE DEPARTMENT 1000 1100 |
---------------------------------------------------
| 20194 20 500 250 200 |
| 20194 10 1000 300 null|
| 20194 10 2354 100 150 |
---------------------------------------------------
I have a transaction data as shown below. which is a 3 months data.
Card_Number Card_type Category Amount Date
0 1 PLATINUM GROCERY 100 10-Jan-18
1 1 PLATINUM HOTEL 2000 14-Jan-18
2 1 PLATINUM GROCERY 500 17-Jan-18
3 1 PLATINUM GROCERY 300 20-Jan-18
4 1 PLATINUM RESTRAUNT 400 22-Jan-18
5 1 PLATINUM HOTEL 500 5-Feb-18
6 1 PLATINUM GROCERY 400 11-Feb-18
7 1 PLATINUM RESTRAUNT 600 21-Feb-18
8 1 PLATINUM GROCERY 800 17-Mar-18
9 1 PLATINUM GROCERY 200 21-Mar-18
10 2 GOLD GROCERY 1000 12-Jan-18
11 2 GOLD HOTEL 3000 14-Jan-18
12 2 GOLD RESTRAUNT 500 19-Jan-18
13 2 GOLD GROCERY 300 20-Jan-18
14 2 GOLD GROCERY 400 25-Jan-18
15 2 GOLD HOTEL 1500 5-Feb-18
16 2 GOLD GROCERY 400 11-Feb-18
17 2 GOLD RESTRAUNT 600 21-Mar-18
18 2 GOLD GROCERY 200 21-Mar-18
19 2 GOLD HOTEL 700 25-Mar-18
20 3 SILVER RESTRAUNT 1000 13-Jan-18
21 3 SILVER HOTEL 1000 16-Jan-18
22 3 SILVER GROCERY 500 18-Jan-18
23 3 SILVER GROCERY 300 23-Jan-18
24 3 SILVER GROCERY 400 28-Jan-18
25 3 SILVER HOTEL 500 5-Feb-18
26 3 SILVER GROCERY 400 11-Feb-18
27 3 SILVER HOTEL 600 25-Mar-18
28 3 SILVER GROCERY 200 29-Mar-18
29 3 SILVER RESTRAUNT 700 30-Mar-18
I am struggling to get below dataframe.
Card_No Card_Type D Jan_Sp Jan_N Feb_Sp Feb_N Mar_Sp GR_T RES_T
1 PLATINUM 70 3300 5 1500 3 1000 2300 100
2 GOLD 72 5200 5 1900 2 1500 2300 1100
3 SILVER . 76 2900 5 900 2 1500 1800 1700
D = Duration in days from first transaction to last transaction.
Jan_Sp = Total spending on January.
Feb_Sp = Total spending on February.
Mar_Sp = Total spending on March.
Jan_N = Number of transaction in Jan.
Feb_N = Number of transaction in Feb.
GR_T = Total spending on GROCERY.
RES_T = Total spending on RESTRAUNT.
I tried following code. I am very new to pandas.
q9['Date'] = pd.to_datetime(Card_Number['Date'])
q9 = q9.sort_values(['Card_Number', 'Date'])
q9['D'] = q9.groupby('ID')['Date'].diff().dt.days
My approach is three steps
get the date range
get the Monthly spending
get the category spending
Step 1: Date
date_df = df.groupby('Card_type').Date.apply(lambda x: (x.max()-x.min()).days)
Step 2: Month
month_df = (df.groupby(['Card_type', df.Date.dt.month_name().str[:3]])
.Amount
.agg({'sum','count'})
.rename({'sum':'_Sp', 'count': '_N'}, axis=1)
.unstack('Date')
)
# rename
month_df.columns = [b+a for a,b in month_df.columns]
Step 3: Category
cat_df = df.pivot_table(index='Card_type',
columns='Category',
values='Amount',
aggfunc='sum')
# rename
cat_df.columns = [a[:2]+"_T" for a in cat_df.columns]
And finally concat:
pd.concat( (date_df, month_df, cat_df), axis=1)
gives:
Date Feb_Sp Jan_Sp Mar_Sp Feb_N Jan_N Mar_N GR_T HO_T RE_T
Card_type
GOLD 72 1900 5200 1500 2 5 3 2300 5200 1100
PLATINUM 70 1500 3300 1000 3 5 2 2300 2500 1000
SILVER 76 900 3200 1500 2 5 3 1800 2100 1700
If your data have several years, and you want to separate them by year, then you can add df.Date.dt.year in each groupby above:
date_df = df.groupby([df.Date.dt.year,'Card_type']).Date.apply(lambda x: (x.max()-x.min()).days)
month_df = (df.groupby([df.Date.dt.year,'Card_type', df.Date.dt.month_name().str[:3]])
.Amount
.agg({'sum','count'})
.rename({'sum':'_Sp', 'count': '_N'}, axis=1)
.unstack(level=-1)
)
# rename
month_df.columns = [b+a for a,b in month_df.columns]
cat_df = (df.groupby([df.Date.dt.year,'Card_type', 'Category'])
.Amount
.sum()
.unstack(level=-1)
)
# rename
cat_df.columns = [a[:2]+"_T" for a in cat_df.columns]
pd.concat((date_df, month_df, cat_df), axis=1)
gives:
Date Feb_Sp Jan_Sp Mar_Sp Feb_N Jan_N Mar_N GR_T HO_T
Date Card_type
2017 GOLD 72 1900 5200 1500 2 5 3 2300 5200
PLATINUM 70 1500 3300 1000 3 5 2 2300 2500
SILVER 76 900 3200 1500 2 5 3 1800 2100
2018 GOLD 72 1900 5200 1500 2 5 3 2300 5200
PLATINUM 70 1500 3300 1000 3 5 2 2300 2500
SILVER 76 900 3200 1500 2 5 3 1800 2100
I would recommend keeping the dataframe this way, so you can access the annual data, e.g. result_df.loc[2017] gives you 2017 data. If you really want 2017 as year, you can do result_df.unstack(level=0).
Say I have the following data in my table;
tran_date withdraw deposit
25/11/2010 0 500
2/12/2010 100 0
15/12/2010 0 300
18/12/2010 0 200
25/12/2010 200 0
Suppose I want to get the following for date range between 1/12/2010 and 31/12/2010.
tran_date withdraw deposit balance days_since_last_tran
1/12/2010 0 0 500 0
2/12/2010 100 0 400 1
15/12/2010 0 300 700 13
18/12/2010 0 200 900 3
25/12/2010 200 0 700 7
31/12/2010 0 0 700 6
Is this doable in PostgreSQL 8.4?
Use:
SELECT t.tran_date,
t.withdraw,
t.deposit,
(SELECT SUM(y.deposit) - SUM(y.withdrawl)
FROM YOUR_TABLE y
WHERE y.tran_date <= t.tran_date) AS balance,
t.tran_date - COALESCE(LAG(t.tran_date) OVER(ORDER BY t.tran_date),
t.tran_date) AS days_since_last
FROM YOUR_TABLE t
8.4+ is nice, providing access to analytic/windowing functions like LAG.