multiindex level output additional row - pandas

I am trying to set a multiindex to a simple pandas dataframe. The first index is type of shop and the second is the type of fruit. I was expecting to see two groups Shop1 and Shop2 for the first column but have ended up with three, Shop1, Shop2 and then Shop1 again. Why is this happening?
Area2 = pd.DataFrame({'01/01/2017': [2000, 2500, 100, 1600],
'01/02/2017': [2000, 2500, 50, 1000],
'01/03/2017': [2000, 500, 50, 1600,],
'01/04/2017': [2500, 2000, 0, 1600],
'Fruit': ['Apples', 'Banana', 'Pears', 'b/berry'],
'Shop': ['Shop1', 'Shop2', 'Shop1', 'Shop1']})
S2 = Area2.set_index(['Shop', 'Fruit'])
Current output
01/01/2017 01/02/2017 01/03/2017 01/04/2017
Shop Fruit
Shop1 Apples 2000 2000 2000 2500
Shop2 Banana 2500 2500 500 2000
Shop1 Pears 100 50 50 0
b/berry 1600 1000 1600 1600
What I was expecting
01/01/2017 01/02/2017 01/03/2017 01/04/2017
Shop Fruit
Shop1 Apples 2000 2000 2000 2500
Pears 100 50 50 0
b/berry 1600 1000 1600 1600
Shop2 Banana 2500 2500 500 2000

I think you need sort_index for sorting MultiIndex:
df = S2.sort_index()
print (df)
01/01/2017 01/02/2017 01/03/2017 01/04/2017
Shop Fruit
Shop1 Apples 2000 2000 2000 2500
Pears 100 50 50 0
b/berry 1600 1000 1600 1600
Shop2 Banana 2500 2500 500 2000
But first level of MultiIndex not showing same consecutive data by default.

Related

plsql sum another table values to joined tables

I have 3 tables.First table is student.Second is student_detail and last one special_codes.
student table
studentname | invoiceno |tax |invoiceamount
Paul 10 500 1950
Georghe 20 1000 6850
Mary 30 1500 1900
Messy 40 2000 7050
studentdetail
invoiceno | code | product | amount
10 101 pencil 100
10 102 rubber 350
10 103 bag 1500
20 108 wheel 5000
20 109 tv 1500
20 110 ps 300
20 111 mouse 50
30 103 bag 1500
30 105 keyboard 400
40 111 mouse 50
40 112 car 7000
I can join these two table like this and get result table
select s.studentname,s.tax,s.invoiceamount,st.product,sum(st.amount) from student s, studentdetail st
where s.invoiceno = st.invoiceno
group by
s.studentname,
s.tax,
s.invoiceamount,
st.product
result table
studentname tax invoiceamount product amount
Paul 500 1950 bag 1500
Paul 500 1950 pencil 100
Paul 500 1950 rubber 350
Messy 2000 7050 car 7000
Messy 2000 7050 mouse 50
Mary 1500 1900 bag 1500
Mary 1500 1900 keyboard 400
Georghe 1000 6850 mouse 50
Georghe 1000 6850 ps 300
Georghe 1000 6850 tv 1500
Georghe 1000 6850 wheel 5000
Last table is special codes.It contains only one column which is called code
specialcodes table
code
101
102
113
104
105
110
111
What i want to do is to look up studentdetail table and to find codes that are same in specialcodes.Then to sum amount values and write sum to result table as another column.Result table
should be like that
result table(final)
studentname tax invoiceamount product amount taxexclude
Paul 500 1950 bag 1500 450
Paul 500 1950 pencil 100 450
Paul 500 1950 rubber 350 450
Messy 2000 7050 car 7000 50
Messy 2000 7050 mouse 50 50
Mary 1500 1900 bag 1500 400
Mary 1500 1900 keyboard 400 400
Georghe 1000 6850 mouse 50 350
Georghe 1000 6850 ps 300 350
Georghe 1000 6850 tv 1500 350
Georghe 1000 6850 wheel 5000 350
You can use analytic functions rather than GROUP BY and aggregating:
select s.studentname,
s.tax,
invoiceamount,
SUM(d.amount) OVER (PARTITION BY s.invoiceno) AS inv_amt_calc,
d.product,
d.amount,
SUM(CASE WHEN c.code IS NOT NULL THEN d.amount END)
OVER (PARTITION BY s.invoiceno) AS taxexclude
from student s
INNER JOIN studentdetail d
ON s.invoiceno = d.invoiceno
LEFT OUTER JOIN specialcodes c
ON (c.code = d.code)
Note: You can (and should) calculate the invoice amount from the studentdetails table rather than duplicating the data in the student table and violating Third-Normal Form.
Which, for your sample data, outputs:
STUDENTNAME
TAX
INVOICEAMOUNT
INV_AMT_CALC
PRODUCT
AMOUNT
TAXEXCLUDE
Paul
500
1950
1950
rubber
350
450
Paul
500
1950
1950
pencil
100
450
Paul
500
1950
1950
bag
1500
450
Georghe
1000
6850
6850
tv
1500
350
Georghe
1000
6850
6850
wheel
5000
350
Georghe
1000
6850
6850
ps
300
350
Georghe
1000
6850
6850
mouse
50
350
Mary
1500
1900
1900
bag
1500
400
Mary
1500
1900
1900
keyboard
400
400
Messy
2000
7050
7050
mouse
50
50
Messy
2000
7050
7050
car
7000
50
If you really want a version using GROUP BY then:
SELECT s.studentname,
s.tax,
s.invoiceamount,
SUM(d.amount) OVER (PARTITION BY s.invoiceno) AS inv_amt_calc,
d.product,
d.amount,
t.taxexclude
FROM student s
INNER JOIN studentdetail d
ON s.invoiceno = d.invoiceno
LEFT OUTER JOIN (
SELECT invoiceno,
SUM(amount) AS taxexclude
FROM studentdetail
WHERE code IN (SELECT code FROM specialcodes)
GROUP BY
invoiceno
) t
ON s.invoiceno = t.invoiceno;
db<>fiddle here

groupby sum month wise on date time data

I have a transaction data as shown below. which is a 3 months data.
Card_Number Card_type Category Amount Date
0 1 PLATINUM GROCERY 100 10-Jan-18
1 1 PLATINUM HOTEL 2000 14-Jan-18
2 1 PLATINUM GROCERY 500 17-Jan-18
3 1 PLATINUM GROCERY 300 20-Jan-18
4 1 PLATINUM RESTRAUNT 400 22-Jan-18
5 1 PLATINUM HOTEL 500 5-Feb-18
6 1 PLATINUM GROCERY 400 11-Feb-18
7 1 PLATINUM RESTRAUNT 600 21-Feb-18
8 1 PLATINUM GROCERY 800 17-Mar-18
9 1 PLATINUM GROCERY 200 21-Mar-18
10 2 GOLD GROCERY 1000 12-Jan-18
11 2 GOLD HOTEL 3000 14-Jan-18
12 2 GOLD RESTRAUNT 500 19-Jan-18
13 2 GOLD GROCERY 300 20-Jan-18
14 2 GOLD GROCERY 400 25-Jan-18
15 2 GOLD HOTEL 1500 5-Feb-18
16 2 GOLD GROCERY 400 11-Feb-18
17 2 GOLD RESTRAUNT 600 21-Mar-18
18 2 GOLD GROCERY 200 21-Mar-18
19 2 GOLD HOTEL 700 25-Mar-18
20 3 SILVER RESTRAUNT 1000 13-Jan-18
21 3 SILVER HOTEL 1000 16-Jan-18
22 3 SILVER GROCERY 500 18-Jan-18
23 3 SILVER GROCERY 300 23-Jan-18
24 3 SILVER GROCERY 400 28-Jan-18
25 3 SILVER HOTEL 500 5-Feb-18
26 3 SILVER GROCERY 400 11-Feb-18
27 3 SILVER HOTEL 600 25-Mar-18
28 3 SILVER GROCERY 200 29-Mar-18
29 3 SILVER RESTRAUNT 700 30-Mar-18
I am struggling to get below dataframe.
Card_No Card_Type D Jan_Sp Jan_N Feb_Sp Feb_N Mar_Sp GR_T RES_T
1 PLATINUM 70 3300 5 1500 3 1000 2300 100
2 GOLD 72 5200 5 1900 2 1500 2300 1100
3 SILVER . 76 2900 5 900 2 1500 1800 1700
D = Duration in days from first transaction to last transaction.
Jan_Sp = Total spending on January.
Feb_Sp = Total spending on February.
Mar_Sp = Total spending on March.
Jan_N = Number of transaction in Jan.
Feb_N = Number of transaction in Feb.
GR_T = Total spending on GROCERY.
RES_T = Total spending on RESTRAUNT.
I tried following code. I am very new to pandas.
q9['Date'] = pd.to_datetime(Card_Number['Date'])
q9 = q9.sort_values(['Card_Number', 'Date'])
q9['D'] = q9.groupby('ID')['Date'].diff().dt.days
My approach is three steps
get the date range
get the Monthly spending
get the category spending
Step 1: Date
date_df = df.groupby('Card_type').Date.apply(lambda x: (x.max()-x.min()).days)
Step 2: Month
month_df = (df.groupby(['Card_type', df.Date.dt.month_name().str[:3]])
.Amount
.agg({'sum','count'})
.rename({'sum':'_Sp', 'count': '_N'}, axis=1)
.unstack('Date')
)
# rename
month_df.columns = [b+a for a,b in month_df.columns]
Step 3: Category
cat_df = df.pivot_table(index='Card_type',
columns='Category',
values='Amount',
aggfunc='sum')
# rename
cat_df.columns = [a[:2]+"_T" for a in cat_df.columns]
And finally concat:
pd.concat( (date_df, month_df, cat_df), axis=1)
gives:
Date Feb_Sp Jan_Sp Mar_Sp Feb_N Jan_N Mar_N GR_T HO_T RE_T
Card_type
GOLD 72 1900 5200 1500 2 5 3 2300 5200 1100
PLATINUM 70 1500 3300 1000 3 5 2 2300 2500 1000
SILVER 76 900 3200 1500 2 5 3 1800 2100 1700
If your data have several years, and you want to separate them by year, then you can add df.Date.dt.year in each groupby above:
date_df = df.groupby([df.Date.dt.year,'Card_type']).Date.apply(lambda x: (x.max()-x.min()).days)
month_df = (df.groupby([df.Date.dt.year,'Card_type', df.Date.dt.month_name().str[:3]])
.Amount
.agg({'sum','count'})
.rename({'sum':'_Sp', 'count': '_N'}, axis=1)
.unstack(level=-1)
)
# rename
month_df.columns = [b+a for a,b in month_df.columns]
cat_df = (df.groupby([df.Date.dt.year,'Card_type', 'Category'])
.Amount
.sum()
.unstack(level=-1)
)
# rename
cat_df.columns = [a[:2]+"_T" for a in cat_df.columns]
pd.concat((date_df, month_df, cat_df), axis=1)
gives:
Date Feb_Sp Jan_Sp Mar_Sp Feb_N Jan_N Mar_N GR_T HO_T
Date Card_type
2017 GOLD 72 1900 5200 1500 2 5 3 2300 5200
PLATINUM 70 1500 3300 1000 3 5 2 2300 2500
SILVER 76 900 3200 1500 2 5 3 1800 2100
2018 GOLD 72 1900 5200 1500 2 5 3 2300 5200
PLATINUM 70 1500 3300 1000 3 5 2 2300 2500
SILVER 76 900 3200 1500 2 5 3 1800 2100
I would recommend keeping the dataframe this way, so you can access the annual data, e.g. result_df.loc[2017] gives you 2017 data. If you really want 2017 as year, you can do result_df.unstack(level=0).

pandas groupby sum, count by date time, where consider only year

I have a transaction data as shown below. which is a 3 months data.
Card_Number Card_type Category Amount Date
0 1 PLATINUM GROCERY 100 10-Jan-18
1 1 PLATINUM HOTEL 2000 14-Jan-18
2 1 PLATINUM GROCERY 500 17-Jan-18
3 1 PLATINUM GROCERY 300 20-Jan-18
4 1 PLATINUM RESTRAUNT 400 22-Jan-18
5 1 PLATINUM HOTEL 500 5-Feb-19
6 1 PLATINUM GROCERY 400 11-Feb-19
7 1 PLATINUM RESTRAUNT 600 21-Feb-19
8 1 PLATINUM GROCERY 800 17-Mar-17
9 1 PLATINUM GROCERY 200 21-Mar-17
10 2 GOLD GROCERY 1000 12-Jan-18
11 2 GOLD HOTEL 3000 14-Jan-18
12 2 GOLD RESTRAUNT 500 19-Jan-18
13 2 GOLD GROCERY 300 20-Jan-18
14 2 GOLD GROCERY 400 25-Jan-18
15 2 GOLD HOTEL 1500 5-Feb-19
16 2 GOLD GROCERY 400 11-Feb-19
17 2 GOLD RESTRAUNT 600 21-Mar-17
18 2 GOLD GROCERY 200 21-Mar-17
19 2 GOLD HOTEL 700 25-Mar-17
20 3 SILVER RESTRAUNT 1000 13-Jan-18
21 3 SILVER HOTEL 1000 16-Jan-18
22 3 SILVER GROCERY 500 18-Jan-18
23 3 SILVER GROCERY 300 23-Jan-18
24 3 SILVER GROCERY 400 28-Jan-18
25 3 SILVER HOTEL 500 5-Feb-19
26 3 SILVER GROCERY 400 11-Feb-19
27 3 SILVER HOTEL 600 25-Mar-17
28 3 SILVER GROCERY 200 29-Mar-17
29 3 SILVER RESTRAUNT 700 30-Mar-17
I am struggling to get below dataframe.
Card_No Card_Type D 2018_Sp 2018_N 2019_Sp 2019_N 2018_Sp
1 PLATINUM 70 3300 5 1500 3 1000
2 GOLD 72 5200 5 1900 2 1500
3 SILVER . 76 2900 5 900 2 1500
D = Duration in days from first transaction to last transaction.
2018_Sp = Total spending on year 2018.
2019_Sp = Total spending on 2019.
2017_Sp = Total spending on 2017.
2018_N = Number of transaction in 2018.
2019_N = Number of transaction in 2019.
Use:
#convert to datetimes
df['Date'] = pd.to_datetime(df['Date'])
#sorting if necessary
df = df.sort_values(['Card_Number','Card_type', 'Date'])
#aggregate count and sum
df1 = (df.groupby(['Card_Number','Card_type', df['Date'].dt.year])['Amount']
.agg([('Sp','size'),('N','sum')])
.unstack()
.sort_index(axis=1, level=1))
#MultiIndex to columns
df1.columns = [f'{b}_{a}' for a, b in df1.columns]
#difference (different output, because different years)
s = df.groupby('Card_type').Date.apply(lambda x: (x.max()-x.min()).days).rename('D')
#join together
df1 = df1.join(s).reset_index()
print (df1)
Card_Number Card_type 2017_N 2017_Sp 2018_N 2018_Sp 2019_N 2019_Sp \
0 1 PLATINUM 1000 2 3300 5 1500 3
1 2 GOLD 1500 3 5200 5 1900 2
2 3 SILVER 1500 3 3200 5 900 2
D
0 706
1 692
2 688

Select most recent entry in table by another column

I have 3 SQL tables. StudentTable, FeeAssociationTable and InvoiceTable.
StudentTable has primary key of AdmissionNumber, FeeAssociationTable has primary key of FeeAssociationID and InvioceTable has primary key InvoiceID.
The FeeAssociationTable takes AdmissionNumber and assign a fee to it, then while depositing fee the InvoiceTable takes AdmissionNumber and calculate all his fee and subtract from paid and the inserts in the dues.
The problem is; same AdmissionNumbercan have multiple row in InvoiceTable, "How can I can select and sum all recent dues of each AdmissionNumber(not repeating)."
Here is some data;
37 1 3000 January-2018 3000 0 2018-08-17
38 2 3000 January-2018 3000 0 2018-08-17
39 3 3000 January-2018 3000 0 2018-08-17
40 4 3000 January-2018 3000 0 2018-08-17
41 5 3000 January-2018 3000 0 2018-08-17
42 6 3000 January-2018 3000 0 2018-08-17
43 7 3000 January-2018 3000 0 2018-08-17
44 8 3000 January-2018 3000 0 2018-08-17
45 9 3000 January-2018 3000 0 2018-08-17
46 10 3000 January-2018 3000 0 2018-08-17
47 1 3200 June-2018 2500 700 2018-08-17
48 2 3200 June-2018 2500 700 2018-08-17
49 3 3200 June-2018 2500 700 2018-08-17
50 4 3200 June-2018 2500 700 2018-08-17
51 5 3200 June-2018 2500 700 2018-08-17
52 6 3200 June-2018 2500 700 2018-08-17
53 7 3200 June-2018 2500 700 2018-08-17
54 8 3200 June-2018 2500 700 2018-08-17
55 9 3200 June-2018 2500 700 2018-08-17
56 10 3200 June-2018 2500 700 2018-08-17
57 10 3700 July-2018 2500 1200 2018-08-17
58 9 3700 July-2018 2400 1300 2018-08-17
59 8 3700 July-2018 200 3500 2018-08-17
60 7 3700 July-2018 2000 1700 2018-08-17
61 1 3700 July-2018 1500 2200 2018-08-17
62 2 3700 July-2018 3100 600 2018-08-17
Expectation:
I want the most recent dues of each student without adding in the previous one.
For example:-
InvoiceId AdmissionNumber TotalFee Month Paid Dues Date
37 1 3000 January-2018 3000 0 2018-08-17
47 1 3200 June-2018 2500 700 2018-08-17
61 1 3700 July-2018 1500 2200 2018-08-17
There are 3 entries of AdmissionNumber 1 in the InvoiceTable. In the first one there are no due, but in the second entry there are Dues of 700, and in the third the dues of 2200 on AdmissionNumber 1.
The thing I want to do is select last one which can be done by this code given below:
SELECT Dues FROM InvoiceTable AS IT
WHERE IT.InvoiceID = (SELECT MAX(InvoiceID)
FROM InvoiceTable WHERE AdmissionNumber = 1)
This is for single I want list of recent dues of every student.
Thanks in Advance
Based on your follow up information, I believe the following will give you what you are looking for:
SELECT invoiceId, AdmissionNumber, Dues
FROM InvoiceTable AS IT
WHERE IT.invoiceId IN (SELECT MAX(invoiceId)
FROM InvoiceTable
GROUP BY AdmissionNumber)
ORDER BY AdmissionNumber ASC
They query you tried in your example was close, it just had to be adapted to work for the whole table and not a single AdmissionNumber.
Here is a demo of this working: SQL Fiddle

Dynamic Columns With Calculated Values

1) StructureTable:
StrId StrName Figure Base On
1 Basic 40 Percent Total
2 D.A. 3495 Amount Total
3 O.A. 0 Remaining Total
2) SalaryTable
StaffId StaffName Salary
1 Ram 25000
2 Shyam 40000
3 Hari 30000
4 Ganesh 15000
3) IncrementTable
StaffId IncAmt
1 5000
2 3000
3 2000
4 4000
Desired Columns Resulted Output by Pivot or others:
StaffId StaffName Basic_Salary D.A_Salary O.A_Salary Total_Salary Basic_Inc D.A_Inc O.A_Inc Total_Inc Basic_Total D.A_Total O.A_Total Total_Total
1 Ram 10000 3495 18495 25000 2000 0 3000 5000 12000 3495 21495 30000
2 Shyam 16000 3495 27495 40000 1200 0 1800 3000 17200 3495 29295 43000
3 Hari 12000 3495 21495 30000 800 0 1200 2000 12800 3495 22695 32000
4 Ganesh 6000 3495 12495 15000 1600 0 2400 4000 7600 3495 14895 19000
Total 44000 13980 79980 110000 5600 0 8400 14000 49600 13980 88380 124000