How to group merge columns based on one row identifier with pandas? - pandas

I have a dataset, in which it has a lot of entries for a single location. I am trying to find a way to sum up all of those entries without affecting any of the other columns. So, just in case I'm not explaining it well enough, I want to use a dataset like this:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
Bedford 10 12 14 17 27
Bedford 11 40 34 9 1
Bedford 7 1 2 3 3
Leeds 1 1 2 0 0
Leeds 20 13 6 1 1
Bath 101 20 33 41 3
Bath 11 2 3 1 0
And turn it into something like this:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
Bedford 28 53 50 29 31
Leeds 21 33 39 1 1
Bath 111 22 36 42 3
Now, I have read up that a groupby should work in a way, but from my understanding a group by will change it into 2 columns and I don't particularly want to make hundreds of 2 columns and then merge it all. Surely there's a much simpler way to do this?

IIUC, groupby+sum will work for you:
df.groupby('Locations',as_index=False,sort=False).sum()
Output:
Locations Cyclists maleRunners femaleRunners maleCyclists femaleCyclists
0 Bedford 28 53 50 29 31
1 Leeds 21 14 8 1 1
2 Bath 112 22 36 42 3

Pivot table should work for you.
new_df = pd.pivot_table(df, values=['Cyclists', 'maleRunners', 'femalRunners',
'maleCyclists','femaleCyclists'],index='Locations', aggfunc=np.sum)

Related

How to calculate maximum of three in combined query

usage
The image shows works that I have done in access. The query "overall usage review" count the number of usages in each month by using combined query.
After this step, I want to show the maximum three of usage.quantity among 12months and calculate with the formula ( E.g., "Total MAX three months usage"/3)
E.g.,
Warehouse Part Number ........................................................
A X01 9 16 7 14 10 5 9 11 6 3 11 5
A X02 20 22 10 12 20 17 18 29 14 13 11 19
B X01 8 7 3 26 17 6 3 2 5 10 8 14
B X05 9 10 16 6 10 4 13 12 6 4 3 6
I want to result it as below...
Warehouse Part Number Maximum three usage quantity Results
A X01 41 41/3
A X02 71 71/3
B X01 57 19
B X05 39 13
Someone told me to use dynamic sql, but I dont know what it is... Pls tell me how to solve this problem in detail. The problem stuck in my mind for a very long time...

Oracle - Group By Creating Duplicate Rows

I have a query that looks like this:
select nvl(trim(a.code), 'Blanks') as Ward, count(b.apcasekey) as UNSP, count(c.apcasekey) as GRAPH,
count(d.apcasekey) as "ANI/PIG",
(count(b.apcasekey) + count(c.apcasekey) + count(d.apcasekey)) as "TOTAL ACTIVE",
count(a.apcasekey) as "TOTAL OPEN" from (etc...)
group by a.code
order by Ward
The reason I have nvl(trim(a.code), 'Blanks') as Ward is that sometimes a.code is a blank string, sometimes it's a null.
The problem is that when I use the Group By statement, I can't use Ward or I get the error
Ward: Invalid Identifier
I can only use a.code so I get 2 rows for 'Blanks', as per below
1 Blanks 7 0 0 7 7
2 Blanks 23 1 1 25 30
3 W01 75 4 0 79 91
4 W02 62 1 0 63 72
5 W03 140 2 0 142 162
6 W04 6 1 0 7 7
7 W05 46 0 1 47 48
8 W06 322 46 1 369 425
9 W07 91 0 1 92 108
10 W08 93 2 0 95 104
11 W09 28 1 0 29 30
12 W10 25 0 0 25 28
What I need, is for the row with 'Blanks' to combined into 1 row. Little help?
Thanks.
You can not use the alias in the GROUP BY, but you can use the expression that builds the value:
GROUP BY nvl(trim(a.code), 'Blanks')

Pandas: Error when merging two tables, Error with set_index

Thanks in advance for your help, here's my question:
I've successfully loaded my df in to ipython notebook and then I ran a group by on it:
station_count = station.groupby('landmark').count()
which produced a table like this:
Now I'm trying to merge it with another table:
dock_count_by_station = station.groupby('landmark').sum()
that is also a simple group by on the same table, but the merge produces an error:
TypeError: cannot concatenate a non-NDFrame object
with this code:
dock_count_by_station.merge(station_count)
I think the problem is that I need to set the index of the two tables before merging them but I keep getting this error for the code below:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)()
KeyError: 'landmark'
station_count.set_index('landmark')
Using join
You can use join, which merges the tables on their index. You may also wish to specify the join type (e.g. 'outer', 'inner', 'left' or 'right'). You have overlapping column names (e.g. station_id), so you need to specify a suffix.
>>> dock_count_by_station.join(station_count, rsuffix='_rhs')
dockcount lat long station_id dockcount_rhs installation lat_rhs long_rhs name station_id_rhs
landmark
Mountain View 117 261.767433 -854.623012 210 7 7 7 7 7 7
Palo Alto 75 187.191873 -610.767939 180 5 5 5 5 5 5
Redwood City 115 262.406232 -855.602755 224 7 7 7 7 7 7
San Francisco 665 1322.569239 -4284.054814 2126 35 35 35 35 35 35
San Jose 249 560.039892 -1828.370075 200 15 15 15 15 15 15
Using merge
Note that your landmark index was set by default when you did the groupby. You can always use as_index=False if you don't want this to occur, but then you would have to use merge instead of join.
dock_count_by_station = station.groupby('landmark', as_index=False).sum()
station_count = station.groupby('landmark', as_index=False).count()
>>> dock_count_by_station.merge(station_count, on='landmark', suffixes=['_lhs', '_rhs'])
landmark dockcount_lhs lat_lhs long_lhs station_id_lhs dockcount_rhs installation lat_rhs long_rhs name station_id_rhs
0 Mountain View 117 261.767433 -854.623012 210 7 7 7 7 7 7
1 Palo Alto 75 187.191873 -610.767939 180 5 5 5 5 5 5
2 Redwood City 115 262.406232 -855.602755 224 7 7 7 7 7 7
3 San Francisco 665 1322.569239 -4284.054814 2126 35 35 35 35 35 35
4 San Jose 249 560.039892 -1828.370075 200 15 15 15 15 15 15

SUM column values based on two rows in the same tables in SQL

I have one table like below in my SQL server.
Trans_id br_code bill_no amount
1 22 111 10
2 22 111 20
3 22 111 30
4 22 111 40
5 22 111 10
6 23 112 20
7 23 112 20
8 23 112 20
9 23 112 30
and I want desired output like below table
s.no br_code bill_no amount
1 22 111 110
2 23 112 90
try this:
select br_code, bill_no, sum(amount)
from TABLE
group by br_code, bill_no

SQL terminology to combine a NOT EXIST query with latest value

I am a beginner with basic knowledge.
I have a single table that I am trying to pull all UID's that have not had a particular code in the table within the past year.
My table looks like this: (but much larger of course)
FACID DPID EID DID UID DT Code Units Charge ET Ord
1 1 6 2 1002 15-Mar-07 99204 1 180 09:36.7 1
1 1 7 5 10004 15-Mar-07 99213 1 68 02:36.9 1
1 1 24 55 25887 15-Mar-07 99213 1 68 43:55.3 1
1 1 25 2 355688 15-Mar-07 99213 1 68 53:20.2 1
1 1 26 5 555654 15-Mar-07 99213 1 68 42:22.6 1
1 1 27 44 135514 15-Mar-07 99213 1 68 00:36.8 1
1 1 28 2 3244522 15-Mar-07 99214 1 98 34:59.4 1
1 1 29 5 235445 15-Mar-07 99213 1 68 56:42.1 1
1 1 30 3 3214444 15-Mar-07 99213 1 68 54:56.5 1
1 1 33 1 221444 15-Mar-07 99204 1 180 37:44.5 1
I am attempting to use the following, but this is not working for my time frame limits.
select distinct UID from PtProcTbl
where DT<'20120101'
and NOT EXISTS (Select Distinct UID
where Code in ('99203','99204','99205','99213',
'99214','99215','99244','99245'))
I need to know how to make sure the UID's that I am pulling are the ones don't have a DT after the 1/1/2012 cut off date that contains one of the NOT Exists codes.
The above query returned UID's that actually dates after 1/1/2012 that does contain one of the above codes...
Not sure what I am doing wrong or if I am totally off base on this..
Thanks in advance.
Are you sure you need the NOT EXISTS? How about instead:
AND Code NOT IN ('99203','99204','99205','99213','99214','99215','99244','99245')