pandas group by and sum with values being displayed - pandas

I need to group by two columns and sum the third one. My data looks like this:
site industry spent
Auto Cars 1000
Auto Fashion 200
Auto Housing 100
Auto Housing 300
Magazine Cars 100
Magazine Fashion 200
Magazine Housing 300
Magazine Housing 500
My code:
df.groupby(by=['site', 'industry'])['Revenue'].sum()
The output is:
spent
site industry
Auto Cars 1000
Fashion 200
Housing 400
Magazine Cars 100
Fashion 200
Housing 800
When I convert it to csv I only get one column - spent. My desired output is the same format as the original data only the revenue needs to be summed and I need to see all the values in columns.

Try this, using as_index=False:
df = df.groupby(by=['site', 'industry'], as_index=False).sum()
print(df)
site industry spent
0 Auto Cars 1000
1 Auto Fashion 200
2 Auto Housing 400
3 Magazine Cars 100
4 Magazine Fashion 200
5 Magazine Housing 800

Related

Is there a easiest way to make number of labels equal with pandas dataframe?

when we use dataset with pandas.dataframe(), sometimes labels categories are not same ratio.
example) bike: car = 7:3
price
label
200
bike
100
bike
700
bike
300
bike
5500
car
400
bike
5200
car
310
bike
2000
car
20
bike
In this case, car and bike are not same ratio.
so, I want to make each category to be in same ratios.
car shows only 3 times, so 4 bike rows are deleted like this...
price
label
200
bike
300
bike
5500
car
5200
car
2000
car
20
bike
order is not important. I just want to get same ratio categories.
I did count car labels and bike labels, and check fewer labels(In this time, car is fewer labels), and read each rows to move another dataframe. It takes a lot of time, so
Inconvenience.
Is there a easiest way to make number of labels equal with pandas dataframe? or just count each label and make another dataframe?
Thank you.
IIUC, take the minimum of each value_counts and GroupBy.head :
out = df.groupby("label").head(min(df["label"].value_counts())) #or GroupBy.sample
Alternatively and in a #mozway, use a grouper :
g = df.groupby("label")
out = g.head(g["price"].size().min())
Output :
print(out)
price label
0 200 bike
1 100 bike
2 700 bike
4 5500 car
6 5200 car
8 2000 car

SQL - Sum a row of values based on Dates

I have following type of data
ingredients:
Milk
Apple
Rice
...
Then its purchased Date
26.10.2020
25.10.2020
etc
Each item is recorded when its purchased.
I want now to get at the right hand side to see how many times I bought apples, rice & milk.
As now I only see
Dates ---> 25.10.2020|24.10.2020
Rice 1 NULL
Milk 1 1
Apples NULL 1
My Goal is to see:
Dates ---> 25.10.2020|24.10.2020 SUM
Rice 1 NULL 1
Milk 1 1 2
Apples NULL 1 1
Thank you for your support!
The example of the data
Now I want to see at the end to total SUM, as there would be multiple days.

Pandas identify # of items which generate 80 of sales

I have a dataframe with for each country, list of product and the relevant sales
I need to identify for each country how many are # of top sales items of which cumulative sales represent 80% of the total sales for all the items in each country.
E.g.
Cnt Product, units
Italy apple 500
Italy beer 1500
Italy bread 2000
Italy orange 3000
Italy butter 3000
Expected results
Italy 3
(Total units are 10.000 and the sales of the top 3 product - Butter, Orange, Bread, is 8.000 which is the 80% of total)
Try define a function and apply on groupby:
def get_sale(x, pct=0.8):
thresh = 0.8 * x.sum()
# sort values descendingly for top salse
x=x.sort_values(ascending=False).reset_index(drop=True)
# store indices of those with cumsum pass threshold
sale_pass_thresh = x.index[x.cumsum().ge(thresh)]
return sale_pass_thresh[0] + 1
df.groupby('Cnt').units.apply(get_sale)
Output:
Cnt
Italy 3
Name: units, dtype: int64
Need play a little bit logic here
df=df.sort_values('units',ascending=False)
g=df.groupby('Cnt').units
s=(g.cumsum()>=g.transform('sum')*0.8).groupby(df.Cnt).sum()
df.groupby('Cnt').size()-s+1
Out[710]:
Cnt
Italy 3.0
dtype: float64

Graphically represent SQL Data

Given a table with the following structure with 11+M transactions.
ID ProductKey CloseDate Part PartAge Sales
1 XXXXP1 5/10/15 P1 13 100
2 XXXXP2 6/1/16 P1 0 15
3 XXXXP3 4/1/08 P1 0 280
4 XXXXP1 3/18/11 P1 0 10
5 XXXXP3 6/29/15 P1 45 15
6 XXXXP1 8/11/13 P1 30 360
Products XXXXP1 and XXXXP3 are entered twice since they are resales. Product Age=0 indicates its a new sale. So these products went from:
New Sale --> ReSale --> ReSale
Using a self-joining query, I can retrieve all the products which were resales. But is there a way to display these in a pretty graph or tree format?
Something which depicts the life-span of the sale transaction of the product?
Any ideas will be appreciated.
TIA,
B

Multiple charts in a SQL Report

I'm preparing a report in SQL Reporting Services 2012 and I want to show a variable number of charts based on the data I have.
So, the (simplified) data source looks like
ID Name Group Sales
=============================
1 apples fruit 15
2 bananas fruit 25
3 carrots vegetable 10
4 brocolli vegetable 19
5 tuna fish 15
For each group - show a graph based on the names and values:
chart 1 - fruit sales
chart 2 - vegetable sales
chart 3 - fish sales
and so on...
But I do not want to hard-code the group names - if a new group is added to the db, a new chart should pop-up in the report.
Create a table based on your dataset grouping on "Group" and then embed the chart in the table, SSRS should take care of the rest.