Select Rows based on a condition in Pandas dataframe with groupby - pandas

I have a pandas dataframe as below -
Federation Game Medal_each_game
0 AFG Athletics 1.00
1 AFG Boxing 0.00
2 AFG Football 1.00
3 AFG Hockey 0.00
4 AFG Taekwondo 2.00
5 AFG Wrestling 0.00
6 AHO Athletics 0.00
7 AHO Boxing 3.00
8 AHO Fencing 2.00
9 AHO Football 0.00
I need to find highest medal count per 'federation' and get the 'Game'
output should be something like this
Federation Game Medal_each_game
0 AFG Taekwondo 2.00
1 AHO Boxing 3.00

Use groupby_idxmax:
>>> df.loc[df.groupby('Federation')['Medal_each_game'].idxmax()]
Federation Game Medal_each_game
4 AFG Taekwondo 2.0
7 AHO Boxing 3.0

Related

Filter a DataFrame based on groupby method and a column

I have to following DF
symbol cml_units number_of_shares price time gain_loss cml_cost cash_flow avg_price
1 BP.L 2 2 504.8275 2022-10-04 14:14:11 0.00 1009.65 -1009.65 504.83
3 BP.L 0 -2 504.2625 2022-10-04 14:43:18 -1.13 -0.01 1008.52 0.00
4 AAPL 0 -3 142.4500 2022-10-04 15:28:33 0.00 284.93 427.35 0.00
5 AAPL 6 3 146.4000 2022-10-06 10:13:53 0.00 1151.51 -439.20 191.92
8 AAPL 47 47 171.5200 2022-08-18 13:45:02 0.00 8061.44 -8061.44 171.52
15 AAPL 0 -47 149.8400 2022-09-25 19:18:42 -1018.96 0.00 7042.48 0.00
20 AAPL 10 7 140.0900 2022-10-09 13:53:05 0.00 1692.94 -980.63 169.29
22 AAPL 3 3 142.4500 2022-10-04 09:06:15 0.00 712.31 -427.35 142.46
23 AAPL 0 3 138.3400 2022-10-13 09:38:23 -24.18 0.00 415.02 0.00
29 AAPL 0 7 138.3400 2022-10-13 09:38:26 -12.25 0.00 968.38 0.00
31 AAPL 5 5 138.3400 2022-10-13 09:46:32 0.00 691.70 -691.70 138.34
38 AAPL 0 5 150.3200 2022-11-01 18:42:08 59.90 0.00 751.60 0.00
44 AAPL 1 1 150.2700 2022-11-01 18:42:47 0.00 150.27 -150.27 150.27
55 AAPL 0 1 149.7000 2022-11-14 12:41:36 -0.57 0.00 149.70 0.00
66 BP.L 2 2 562.4942 2022-10-14 12:42:48 0.00 1124.98 -1124.99 562.49
68 AAPL 2 2 149.7000 2022-11-14 14:39:57 0.00 299.40 -299.40 149.70
70 AAPL 0 -2 148.2800 2022-11-15 09:07:41 -2.84 0.00 296.56 0.00
73 BP.L 1 -1 562.1850 2022-11-15 09:12:41 -0.31 562.49 562.18 562.49
74 AAPL 3 3 148.2800 2022-11-15 13:14:36 0.00 444.84 -444.84 148.28
I need to filter out all the rows that are previous to the last time cml_units was 0 for each symbol.
For example on the above DF the result should be:
symbol cml_units number_of_shares price time gain_loss cml_cost cash_flow avg_price
66 BP.L 2 2 562.4942 2022-10-14 12:42:48 0.00 1124.98 -1124.99 562.49
73 BP.L 1 -1 562.1850 2022-11-15 09:12:41 -0.31 562.49 562.18 562.49
74 AAPL 3 3 148.2800 2022-11-15 13:14:36 0.00 444.84 -444.84 148.28
This is because BP.L on 2022-10-14 12:42:48 was the first purchase after cml_units were 0 on the 2022-10-04 14:43:18, and AAPL on the 2022-11-15 13:14:36 was the first purchase after cml_units were 0 on the 2022-11-15 09:07:41.
This DF can be in any shape so I am trying to find an inclusive wholesome way to achieve it, even if the DF have other stocks.
First you should sort your df by time. Then you can group and concat based on condition:
df = df.sort_values('time')
df_out = pd.DataFrame()
for sym, sub_df in df.groupby('symbol'):
zero_dates = sub_df[(sub_df['cml_units'] == 0)]['time']
if not zero_dates.empty:
last_zero_date = zero_dates.values[-1]
else:
last_zero_date = pd.to_datetime(0)
df_out = pd.concat([df_out, sub_df[sub_df['time'] > last_zero_date]])
print(df_out)
Edit: adding handling of cases where cml_units is always >0
Output:
symbol cml_units number_of_shares price time gain_loss cml_cost cash_flow avg_price
id
74 AAPL 3 3 148.2800 2022-11-15 13:14:36 0.00 444.84 -444.84 148.28
66 BP.L 2 2 562.4942 2022-10-14 12:42:48 0.00 1124.98 -1124.99 562.49
73 BP.L 1 -1 562.1850 2022-11-15 09:12:41 -0.31 562.49 562.18 562.49

SQLServer - Pivoting a table with Group

I am wondering if what I am trying to do is possible. I believe it is using the PIVOT function in TSQL but don't have enough experience with the PIVOT function to know where to start.
Basically I'm trying to take the following # table called #tmpbudgetdata (truncated for simplicity):
Account Description BudgetAmount Period
-------------------- ---------------------------------------------------------------------------------------------------- --------------------- --------------------
4001 Mood Embedded Account 0.00 1
4001 Mood Embedded Account 0.00 2
4001 Mood Embedded Account 0.00 3
4001 Mood Embedded Account 0.00 4
4001 Mood Embedded Account 0.00 5
4001 Mood Embedded Account 0.00 6
4001 Mood Embedded Account 0.00 7
4001 Mood Embedded Account 0.00 8
4001 Mood Embedded Account 0.00 9
4001 Mood Embedded Account 0.00 10
4001 Mood Embedded Account 0.00 11
4001 Mood Embedded Account 0.00 12
4003 DBS Music 0.00 1
4003 DBS Music 0.00 2
4003 DBS Music 0.00 3
4003 DBS Music 0.00 4
4003 DBS Music 0.00 5
4003 DBS Music 0.00 6
4003 DBS Music 0.00 7
4003 DBS Music 0.00 8
4003 DBS Music 0.00 9
4003 DBS Music 0.00 10
4003 DBS Music 0.00 11
4003 DBS Music 0.00 12
4010 Sales - Software 5040.00 1
4010 Sales - Software 0.00 2
4010 Sales - Software 6280.56 3
4010 Sales - Software 6947.93 4
4010 Sales - Software 4800.00 5
4010 Sales - Software 0.00 6
4010 Sales - Software 2400.00 7
4010 Sales - Software 2550.00 8
4010 Sales - Software 4800.00 9
4010 Sales - Software 2400.00 10
4010 Sales - Software 0.00 11
4010 Sales - Software 2400.00 12
4015 New Install Revenue 0.00 1
4015 New Install Revenue 0.00 2
4015 New Install Revenue 0.00 3
4015 New Install Revenue 3844.79 4
4015 New Install Revenue 0.00 5
4015 New Install Revenue 0.00 6
4015 New Install Revenue 0.00 7
4015 New Install Revenue 0.00 8
4015 New Install Revenue 0.00 9
4015 New Install Revenue 0.00 10
4015 New Install Revenue 0.00 11
4015 New Install Revenue 0.00 12
and turning it into something like this:
Account Description Period1 Period2 Period3 Period4 Period5 Period6 Period7 Period8 Period9 Period10 Period11 Period12
------- --------------- -------- ------- -------- ------ ------- ------- -------- ------ ------- -------- -------- --------
4001 Mood Enabled... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4003 Dbs Music 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4010 Sales - Software 5040.00 0.00 6280.56 6947.93 4800.00 0.00 2400.00 2550.00 4800.00 2400.00 0.00 2400.00
...etc...
Basically just grouping via the Account column (the description is the same per account) and then taking the period values and pivoting them horizontally.
I know I could do it with a cursor and loop through but wondering if this is possible with a pivot or by other means.
Thanks in advance
I simple PIVOT should do the trick
Example
Select *
From (
Select [Account]
,[Description]
,Period = concat('Period',Period)
,[BudgetAmount]
From YourTable
) src
Pivot (sum([BudgetAmount]) for Period in ( [Period1],[Period2],[Period3],[Period4],[Period5],[Period6],[Period7],[Period8],[Period9],[Period10],[Period11],[Period12] ) ) pvt
Returns

Cumulative Total by year

So, I have the following query,
WITH yearlist AS
(
SELECT (year(getdate())+3) AS years
UNION ALL
SELECT y.years - 1 AS years
FROM yearlist y
WHERE y.years - 1 >= (YEAR(GETDATE())-10)
)
SELECT
a.years as [year],
a.CountryName as country,
ISNULL(sum(b.sales), 0) as total
FROM(
SELECT
distinct years
,g.CountryName
FROM
yearlist AS A CROSS JOIN (SELECT
CountryName, salesYear, ISNULL(sum(sales), 0) as total
FROM tblSales where
salesYear BETWEEN (year(getdate())-12) AND (year(getdate()) + 3)
,sales
,salesYear) g
) a left outer join
(SELECT
CountryName, salesYear, ISNULL(sum(sales), 0) as total
FROM tblSales where
salesYear BETWEEN (year(getdate())-12) AND (year(getdate()) + 3)
group by CountryName
,salesYear, sales
) b ON a.CountryName=b.CountryName and a.years=b.salesYear
group by a.CountryName,years
order by years
I am getting the following returned:
year country Total
---------- ---------------------------------------- -------
2009 France 0.00
2009 Japan 0.00
2009 Norway 2.30
2009 Portugal 0.00
2009 South Korea 0.00
2009 Spain 0.00
2009 Sweden 0.00
2009 United Kingdom 0.00
2009 United States 0.00
2010 France 0.00
2010 Japan 0.00
2010 Norway 0.00
2010 Portugal 0.00
2010 South Korea 0.00
2010 Spain 0.00
2010 Sweden 0.00
2010 United Kingdom 0.00
2010 United States 0.00
2011 France 0.00
2011 Japan 0.00
2011 Norway 0.00
2011 Portugal 2.00
2011 South Korea 0.00
2011 Spain 0.00
2011 Sweden 0.00
2011 United Kingdom 0.00
2011 United States 0.00
2012 France 0.00
2012 Japan 0.01
2012 Norway 0.00
2012 Portugal 0.00
2012 South Korea 0.00
2012 Spain 0.00
2012 Sweden 0.00
2012 United Kingdom 0.00
2012 United States 0.00
2013 France 0.00
2013 Japan 2.00
2013 Norway 0.00
2013 Portugal 0.00
2013 South Korea 0.00
2013 Spain 0.00
2013 Sweden 0.00
2013 United Kingdom 0.00
2013 United States 0.00
I am trying to achieve a cumulative total for each country, as the years increase. But I cant seem to get it. I've tried this:
sum(sales) over (order by salesYear rows unbounded preceding) as total
But that just filled each row with the cumulative total.
The output I desire is as follows:
year country Total
---------- ---------------------------------------- -------
2009 France 0.00
2010 France 0.00
2011 France 0.00
2009 Japan 0.00
2010 Japan 0.00
2011 Japan 0.00
2009 Norway 2.30
2010 Norway 2.30
2011 Norway 2.30
2009 Portugal 0.00
2010 Portugal 0.00
2011 Portugal 2.00
2009 South Korea 0.00
2010 South Korea 0.00
2011 South Korea 0.00
2009 Spain 0.00
2010 Spain 0.00
2011 Spain 0.00
2009 Sweden 0.00
2010 Sweden 0.00
2011 Sweden 0.00
2009 United Kingdom 0.00
2010 United Kingdom 0.00
2011 United Kingdom 0.00
2009 United States 0.00
2010 United States 0.00
2011 United States 0.00
I just cant seem to get them to individually accumulate.
You most likely need a partition by clause, too:
sum(sum(sales)) over (partition by country order by salesYear rows unbounded preceding)

VBA calculate individual hours, from list of multiple employees [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I tried to avoid using VBA for this due to my lack of familiarity, but it looks like this is complex enough to require the Excel native language.
What I have is a CSV with a list of employees hours worked each day, all in separate rows. Employee name, Date, Regular Hours, and OT Hours are in separate columns. The challenge is that for employee Bob, there may be 20 rows for one day. Bob applies a slice of time to different projects all day. Then of course there are multiple days and multiple employees. What I am trying to end up with is a report that shows all of the regular and OT hours (separately) for each employee on a daily basis.
What I can't wrap my head around is how to start the compilation. I am guessing that separating each employee would be the start. Then separating each date, then adding all hours for that date.
I appreciate any assistance.
Emp# Name Date Reg OT
Emp1 Bob 1/1/2016 8.00 0.00
Emp1 Bob 1/4/2016 3.00 0.00
Emp1 Bob 1/4/2016 5.00 0.00
Emp1 Bob 1/5/2016 2.00 0.00
Emp1 Bob 1/5/2016 1.00 0.00
Emp1 Bob 1/5/2016 5.00 0.00
Emp1 Bob 1/6/2016 1.00 0.00
Emp1 Bob 1/6/2016 2.00 0.00
Emp1 Bob 1/6/2016 5.00 0.00
Emp1 Bob 1/7/2016 2.00 0.00
Emp2 Henry 1/1/2016 8.00 0.00
Emp2 Henry 1/4/2016 8.00 0.00
Emp2 Henry 1/5/2016 8.00 0.00
Emp2 Henry 1/6/2016 2.00 0.00
Emp2 Henry 1/6/2016 6.00 0.00
Emp2 Henry 1/7/2016 1.50 0.00
Emp2 Henry 1/7/2016 0.50 0.00
Emp2 Henry 1/7/2016 6.00 0.00
Emp2 Henry 1/8/2016 8.00 0.00
Emp2 Henry 1/11/2016 8.00 0.00
Emp2 Henry 1/12/2016 3.00 0.00
Emp2 Henry 1/12/2016 1.00 0.00
Emp2 Henry 1/12/2016 3.00 0.00
Emp2 Henry 1/12/2016 1.00 0.00
Emp2 Henry 1/13/2016 1.50 0.00
Your description sounds like you want to use a pivot table. They are easy to build - this example literally took me 5 minutes to build, including typing in the data.
As an illustration, you can take data that looks like this ...
and consolidate it in a way that provides a lot of flexibility in looking at it. Such as this:
or this
There are several good, simple tutorials available on building Pivot Tables. A google search turns up plenty.

Partition Items by Item# and Location in SQL Server

Given that I have the following dataset in TempTable:
Item Item Desc Location Qty LeasedQty
----------------------------------------------------------------
IT2250 1/2CANTOP NYC 1.00 30.00
IT5550 FCM 2K NYC 6.00 8.00
IT2075 HPTL 750 LA 4.00 44.00
IT12506 DOUBLE DOOR 10" CALI 60.00 0.00
IT3606 BAG180 CALI 25.00 0.00
IT3606 BAG180 NYC 20.00 40.00
IT3606 BAG180 LA 5.00 45.00
IT50 2K NYC 6.00 8.00
IT50 2K LA 4.00 44.00
IT50 2K CALI 60.00 0.00
How can I partition this data so that It will be Like the following:
Item Item Desc Location Qty LeasedQty RNK
----------------------------------------------------------------------
IT2250 1/2CANTOP NYC 1.00 30.00 1
IT5550 FCM 2K NYC 6.00 8.00 2
IT2075 HPTL 750 LA 4.00 44.00 3
IT12506 DOUBLE DOOR 10" CALI 60.00 0.00 4
IT3606 BAG180 CALI 25.00 0.00 5
IT3606 BAG180 NYC 20.00 40.00 5
IT3606 BAG180 LA 5.00 45.00 5
IT50 2K NYC 6.00 8.00 6
IT50 2K LA 4.00 44.00 6
IT50 2K CALI 60.00 0.00 6
Basically, I want the data to group by each item and gather the TOP 20 items based on the QTY (DESCENDING)
To get that rank this will help you
SELECT Dense_rank()
OVER (
ORDER BY Item) rnk,
*
FROM TempTable
Added to get top 20 records
SELECT TOP 20 *
FROM (SELECT Dense_rank()
OVER (
ORDER BY Item,qty desc) rnk,
*
FROM TempTable) a
WHERE rnk <= 20