Rename a multi index tuple pandas dataframe - dataframe

I have the dataframe bellow issued from a pd.pivot_table
sum
Price
Manager Status
Debra Henley declined 70000
pending 50000
presented 50000
won 65000
Fred Anderson declined 65000
pending 5000
presented 45000
won 172000
I want to Add a TOTAL in the index in the last line to have this result:
sum
Price
Manager Status
Debra Henley declined 70000
pending 50000
presented 50000
won 65000
Fred Anderson declined 65000
pending 5000
presented 45000
won 172000
All TOTAL 522000
How can I do this please ?

For example this is the data frame:
Manager Status Price
0 Debra Henley declined 1000
1 Fred Anderson pending 1001
2 Debra Henley presented 1002
3 Fred Anderson won 1003
4 Debra Henley declined 1004
5 Fred Anderson pending 1005
6 Debra Henley presented 1006
7 Fred Anderson won 1007
8 Debra Henley declined 1008
9 Fred Anderson pending 1009
10 Debra Henley presented 1010
11 Fred Anderson won 1011
12 Debra Henley declined 1012
13 Fred Anderson declined 1013
14 Debra Henley pending 1014
15 Fred Anderson presented 1015
16 Debra Henley won 1016
17 Fred Anderson declined 1017
18 Debra Henley declined 1018
To pivot the table:
df.pivot_table(values='Price', index=['Manager', 'Status'], margins=True, margins_name='Total', aggfunc={'Price': np.sum})
Result:
Price
Manager Status
Debra Henley declined 5042
pending 1014
presented 3018
won 1016
Fred Anderson declined 2030
pending 3015
presented 1015
won 3021
To add Total in last row, add this margins=True, margins_name='Total' to your pivot code.
Final code:
df.pivot_table(values='Price', index=['Manager', 'Status'], margins=True, margins_name='Total', aggfunc={'Price': np.sum})
Result:
Price
Manager Status
Debra Henley declined 5042
pending 1014
presented 3018
won 1016
Fred Anderson declined 2030
pending 3015
presented 1015
won 3021
Total 19171
Hope it is useful

Related

Employee monthly report including leaves and working days

Consider one week report start Dates: from 11/1/2022 to 11/5/2022
Table : Timesheet
timesheet_id
start_time_server
Login_by
user_id
1234
11/1/2022
16:20:00 AM
jon 101
1235
11/1/2022
12:20:100 AM
tom 102
1236
11/2/2022
18:40:00 AM
tom 102
1237
11/3/2022
18:40:00 AM
tom 102
Table : Leaves
timesheet_id
Leave_applied_date
Leave_start_time_server
user_id
user_name
1234
11/1/2022
16:20:00 AM #########
101
jon
1234
11/1/2022
16:20:00 AM #########
101
jon
1234
11/1/2022
16:20:00 AM #########
102
jon
1234
11/1/2022
16:20:00 AM #########
103
jon
1237
11/3/2022
18:40:00 AM #########
102
tom
1237
11/3/2022
18:40:00 AM #########
102
tom
final output DailyWorkReport :
user_name
11/1/2022
11/2/2022
11/3/2022
11/4/2022
11/5/2022
jon
8
Leave
Leave
Leave
Leave
tom
8
8
8
Leave
Leave
Please help how i can achieve this final Dailyworkreprot.

finding duplicate values with join

ITEMS
ITEM_ID NAME_ID ITEM_NAME
1001 2001 Office chair
1002 2002 Writing Desk
1003 2003 Filing cabinet
1004 2004 Bookshelf bookcase
1005 2005 Table lamp
1006 2001 Office chair
1007 2002 Writing Desk
1008 2003 Filing cabinet
1009 2004 Bookshelf bookcase
1010 2005 Table lamp
1011 2001 Office chair
1012 2002 Writing Desk
1013 2003 Filing cabinet
1014 2004 Bookshelf bookcase
1015 2005 Table lamp
1016 2016 Triangle window
1017 2017 Screen
1018 2018 Cradle
1019 2017 Screen
1020 2018 Cradle
1021 2017 Screen
1022 2018 Cradle
1023 2023 Futon
1024 2024 Single bed
1025 2025 Bunk beds
1026 2026 Sofa bed
1027 2027 Camp bed cot sleeping bag
1028 2028 Airbed air mattress
1029 2029 Hammock
1030 2030 Loveseat
1031 2031 Sleeper sofa
1032 2032 Settee
1032 2032 Settee
1033 2001 Office chair
1034 2002 Writing Desk
1035 2003 Filing cabinet
1036 2004 Bookshelf/bookcase
1037 2005 Table lamp
1038 2001 Office chair
1039 2002 Writing Desk
1040 2003 Filing cabinet
1041 2004 Bookshelf/bookcase
1042 2005 Table lamp
1043 2017 Screen
1044 2018 Cradle
1045 2017 Screen
1046 2018 Cradle
1047 2017 Screen
1048 2018 Cradle
1049 2017 Screen
1050 2018 Cradle
ITEMS_DETAILS:
CITY ITEM_ID SHOP_ID
NEW YORK 1001 4001
NEW YORK 1002 4002
NEW YORK 1003 4003
NEW YORK 1004 4004
NEW YORK 1005 4005
DALLAS 1006 4006
DALLAS 1007 4007
DALLAS 1008 4008
DALLAS 1009 4001
DALLAS 1010 4002
DALLAS 1011 4003
DALLAS 1012 4004
WASHINGTON 1013 4005
WASHINGTON 1014 4006
WASHINGTON 1015 4007
WASHINGTON 1016 4008
WASHINGTON 1017 4009
WASHINGTON 1018 4010
WASHINGTON 1019 4011
SANFRANSISCO 1020 4012
SANFRANSISCO 1021 4013
CHICAGO 1022 4014
CHICAGO 1023 4015
CHICAGO 1024 4016
CHICAGO 1025 4017
BOSTON 1026 4018
BOSTON 1027 4019
BOSTON 1028 4020
BOSTON 1029 4021
BOSTON 1030 4022
SANFRANSISCO 1031 4023
SANFRANSISCO 1032 4024
SANFRANSISCO 1032 4025
SANFRANSISCO 1033 4026
Las Vegas 1034 4027
Austin 1035 4028
Houston 1036 4029
Los Angeles 1037 4030
Seattle 1038 4031
Atlanta 1039 4032
McKinney 1040 4033
Vancouver 1041 4034
Las Vegas 1042 4035
Austin 1043 4036
Houston 1044 4037
Los Angeles 1045 4038
Seattle 1046 4034
Atlanta 1047 4035
McKinney 1048 4036
Vancouver 1049 4037
Las Vegas 1050 4043
Austin 1051 4044
Houston 1052 4045
Los Angeles 1053 4046
Seattle 1054 4047
Atlanta 1055 4048
McKinney 1056 4049
Vancouver 1057 4050
Las Vegas 1058 4051
Austin 1059 4052
Houston 1060 4053
Hi All,
I am trying to find the duplicates values of the columns after the result of the join ITEMS & ITEM_DETAILS.
I know the sql for duplicate values of column on a single table. A bit confused with join.
Logic: If ITEM_NAME is the same but SHOP_ID is different, it should show as duplicate. If SHOP_ID is the same, it should show as unique
Please help me.
I tried as below:
select * from (
select a.NAME_ID from ITEMS a inner join ITEMS_DETAILS b on b.ITEM_ID = a.ITEM_ID) x
inner join ITEMS y on y.NAME_ID=x.NAME_ID
inner join ITEMS_DETAILS z on z.ITEM_ID=y.ITEM_ID
If you are interested in grouping and counting dups then try the query below:
SELECT
COUNT(*) As DupCount,
y.ITEM_ID
FROM
ITEMS y
INNER JOIN ITEMS_DETAILS z ON z.ITEM_ID=y.ITEM_ID
GROUP BY
y.ITEM_ID
HAVING
COUNT(*) > 1

Pandas iterate over rows and conditional count

I am trying to iterate over rows in a pandas Dataframe with a conditional count in a new column called Stage. For each name the stage should start at 1, and if the name is the same between rows then after a "Healthy" status a new stage should start. A "Healthy" event will be in the same stage as the preceding "Sick" events, if they exist. I've done the code in excel before but not sure how to do it in python.
What I have now is:
Date
Name
Status
2020-01-02
Mary
Healthy
2020-01-05
Mary
Sick
2020-01-15
Mary
Sick
2020-01-20
Mary
Healthy
2020-02-03
Mary
Healthy
2020-02-06
Mary
Sick
2020-02-10
Mary
Sick
2020-02-15
Mary
Healthy
2020-01-02
Bob
Healthy
2020-01-05
Bob
Healthy
2020-01-15
Bob
Healthy
2020-01-20
Bob
Sick
2020-02-03
Bob
Sick
2020-02-06
Bob
Sick
2020-02-10
Bob
Sick
2020-02-15
Bob
Healthy
What I would like to have:
Date
Name
Status
Stage
2020-01-02
Mary
Healthy
1
2020-01-05
Mary
Sick
2
2020-01-15
Mary
Sick
2
2020-01-20
Mary
Healthy
2
2020-02-03
Mary
Healthy
3
2020-02-06
Mary
Sick
4
2020-02-10
Mary
Sick
4
2020-02-15
Mary
Healthy
4
2020-01-02
Bob
Healthy
1
2020-01-05
Bob
Healthy
2
2020-01-15
Bob
Healthy
3
2020-01-20
Bob
Sick
4
2020-02-03
Bob
Sick
4
2020-02-06
Bob
Sick
4
2020-02-10
Bob
Sick
4
2020-02-15
Bob
Healthy
4
You don't need an explicit loop. You need the following:
group by the name column
apply to each group:
shift the Status column to look at the previous value
take cumulative sum of the following series:
if the previous value is null and current value is Healthy, we're at the first row so call it one
if the previous row is Healthy, call it one
otherwise, call it zero
from io import StringIO
import numpy
import pandas
df = pandas.read_csv(StringIO("""\
|Date|Name|Stage|
|2020-01-02|Mary|Healthy|
|2020-01-05|Mary|Sick|
|2020-01-15|Mary|Sick|
|2020-01-20|Mary|Healthy|
|2020-02-03|Mary|Healthy|
|2020-02-06|Mary|Sick|
|2020-02-10|Mary|Sick |
|2020-02-15|Mary|Healthy|
|2020-01-02|Bob|Healthy|
|2020-01-05|Bob|Healthy|
|2020-01-15|Bob|Healthy|
|2020-01-20|Bob|Sick|
|2020-02-03|Bob|Sick|
|2020-02-06|Bob|Sick|
|2020-02-10|Bob|Sick |
|2020-02-15|Bob|Healthy|
"""), sep='|').loc[:, ['Date', 'Name', 'Stage']]
output = (
df.assign(Status=lambda df: df.groupby('Name')['Stage'].apply(lambda g:
numpy.bitwise_or( # returns 1 if either two conditions are met
g.shift().eq('Healthy'), # general case
g.shift().isnull() & g.eq("Healthy") # handles first row of a group
).cumsum()
))
)
print(output.to_string())
And I get:
Date Name Stage Status
0 2020-01-02 Mary Healthy 1
1 2020-01-05 Mary Sick 2
2 2020-01-15 Mary Sick 2
3 2020-01-20 Mary Healthy 2
4 2020-02-03 Mary Healthy 3
5 2020-02-06 Mary Sick 4
6 2020-02-10 Mary Sick 4
7 2020-02-15 Mary Healthy 4
8 2020-01-02 Bob Healthy 1
9 2020-01-05 Bob Healthy 2
10 2020-01-15 Bob Healthy 3
11 2020-01-20 Bob Sick 4
12 2020-02-03 Bob Sick 4
13 2020-02-06 Bob Sick 4
14 2020-02-10 Bob Sick 4
15 2020-02-15 Bob Healthy 4

SQL server select from 3 tables

I have three tables in my database Books, Borrowers and Movement:
Books
BookID Title Author Category Published
----------- ------------------------------ ------------------------- --------------- ----------
101 Ulysses James Joyce Fiction 1922-06-16
102 Huckleberry Finn Mark Twain Fiction 1884-03-24
103 The Great Gatsby F. Scott Fitzgerald Fiction 1925-06-17
104 1984 George Orwell Fiction 1949-04-19
105 War and Peace Leo Tolstoy Fiction 1869-08-01
106 Gullivers Travels Jonathan Swift Fiction 1726-07-01
107 Moby Dick Herman Melville Fiction 1851-08-01
108 Pride and Prejudice Jane Austen Fiction 1813-08-13
110 The Second World War Winston Churchill NonFiction 1953-09-01
111 Relativity Albert Einstein NonFiction 1917-01-09
112 The Right Stuff Tom Wolfe NonFiction 1979-09-07
121 Hitchhikers Guide to Galaxy Douglas Adams Humour 1975-10-27
122 Dad Is Fat Jim Gaffigan Humour 2013-03-01
131 Kick-Ass 2 Mark Millar Comic 2012-03-03
133 Beautiful Creatures: The Manga Kami Garcia Comic 2014-07-01
Borrowers
BorrowerID Name Birthday
----------- ------------------------- ----------
2 Bugs Bunny 1938-09-08
3 Homer Simpson 1992-09-09
5 Mickey Mouse 1928-02-08
7 Fred Flintstone 1960-06-09
11 Charlie Brown 1965-06-05
13 Popeye 1933-03-03
17 Donald Duck 1937-07-27
19 Mr. Magoo 1949-09-14
23 George Jetson 1948-04-08
29 SpongeBob SquarePants 1984-08-04
31 Stewie Griffin 1971-11-17
Movement
MoveID BookID BorrowerID DateOut DateIn ReturnCondition
----------- ----------- ----------- ---------- ---------- ---------------
1 131 31 2012-06-01 2013-05-24 good
2 101 23 2012-02-10 2012-03-24 good
3 102 29 2012-02-01 2012-04-01 good
4 105 7 2012-03-23 2012-05-11 good
5 103 7 2012-03-22 2012-04-22 good
6 108 7 2012-01-23 2012-02-12 good
7 112 19 2012-01-12 2012-02-10 good
8 122 11 2012-04-14 2013-05-01 poor
9 106 17 2013-01-24 2013-02-01 good
10 104 2 2013-02-24 2013-03-10 bitten
11 121 3 2013-03-01 2013-04-01 good
12 131 19 2013-04-11 2013-05-23 good
13 111 5 2013-05-22 2013-06-22 poor
14 131 2 2013-06-12 2013-07-23 bitten
15 122 23 2013-07-10 2013-08-12 good
16 107 29 2014-01-01 2014-02-14 good
17 110 7 2014-01-11 2014-02-01 good
18 105 2 2014-02-22 2014-03-02 bitten
What is a query I can use to find out which book was borrowed by the oldest borrower?
I am new to SQL and am using Microsoft SQL Server 2014
Here are two different solutions:
First using two sub querys and one equi-join:
select Title
from Books b , Movement m
where b.BookID = m.BookID and m.BorrowerID = (select BorrowerID
from Borrowers
where Birthday = (select MIN(Birthday)
from Borrowers))
Using two equi-joins and one sub query:
select Title
from Books b, Borrowers r, Movement m
where b.BookID = m.BookID
and m.BorrowerID = r.BorrowerID
and Birthday = (select MIN(Birthday) from Borrowers)
Both above queries give the following answer:
Title
------------------------------
Relativity

SQL Queries (Difference between tables)

I'm trying to find a difference between two tables. The tables are
Sample Data
PERSON_PHOTO
ID USERID FNAME
801 uid01 Geroge
801 uid05 George
803 uid01 George
901 uid01 Alice
201 uid01 Alice
330 uid01 Alice
802 uid05 Alice
803 uid05 Alice
804 uid05 Alice
901 uid05 Alice
701 uid05 Alice
201 uid05 Alice
101 uid05 Alice
330 uid05 Alice
501 uid05 Alice
501 uid12 Jane
330 uid12 Jane
101 uid12 Jane
201 uid12 Jane
701 uid12 Jane
801 uid12 Jane
901 uid12 Jane
101 uid07 Mary
101 uid03 Mary
201 uid03 Mary
801 uid03 Mary
901 uid03 Mary
201 uid15 Tom
801 uid15 Tom
Table VALID_FRIEND
FNAME USERID
Bill uid02
George uid01
Mary uid07
Jane uid12
Tom uid15
Alice uid05
Mary uid03
SAMPLE OUTPUT
USERID PHOTOS NOT IN
uid02 0
uid01 5
uid07 9
uid12 3
uid15 8
uid05 8
uid03 6
The query I'm trying to perform is to find the number of Photos that the person is not in. I'm trying to output by USERID and the number of photos not currently in. I know I need to find the count of the distinct PID in person photo and take the difference of the count of the userid in photo. Thanks for any help.