how to plot pie charts separately according to their rows using pandas dataframe - pandas

I would like to create pie charts according to their respective rows such that each pie chart contain the 3 different columns in their respective years
I manage to create the pie charts but they are all squeezed together in one graph, how can I separate them?
this is my dataset:
sector year Total in Practice (OT) Total in Practice (SLP) Total in Practice (SLP)
0 2014 123 400 123
1 2015 234 456 123
2 2016 345 484 345
3 2017 345 539 566
4 2018 453 565 123
5 2019 454 598 234
6 2020 453 626 243
7 2021 755 682 243
this is my code:
df_all.T.plot.pie(df_all,subplots=True, figsize=(10, 3))
and this is how my plot end up as

Related

Dynamically Calculate difference columns based off slicer- POWERBI

I have a table with quarterly volume data, and a slicer that allows you to choose what quarter/year you want to see volume per code for. The slicer has 2019Q1 through 2021Q4 selections. I need to create dynamic difference column that will adjust depending on what quarter/year is selected in the slicer. I know I need to create a new measure using Calculate/filter but am a beginner in PowerBI and am unsure how to write that formula.
Example of raw table data:
Code
2019Q1
2019Q2
2019Q3
2019Q
2020Q1
2020Q2
2020Q3
2020Q4
11111
232
283
289
19
222
283
289
19
22222
117
481
231
31
232
286
2
19
11111
232
397
94
444
232
553
0
188
22222
117
411
15
14
232
283
25
189
Example if 2019Q1 and 2020Q1 are selected:
Code
2019Q1
2020Q1
Difference
11111
232
222
10
22222
117
481
-364
11111
232
397
-165
22222
117
411
-294
Power BI doesn't work that way. This is an Excel pivot table setup. You don't have any parameter to distinguish first and third or second and fourth row. They have the same code, so Power BI will aggregate their volumes. You could introduce a hidden index column but then why don't you simply stick to Excel? The Power BI approch to the problem would be to unpivot (stack) your table to a Code, Quarter and a Volume column, create 2 independent slicer tables for Minuend and Subtrahend and then CALCULATE your aggregated differences based on the SELECTEDVALUE of the 2 slicers.

Pandas extract hierarchical info?

I have a dataframe which describes serial numbers of items arranged in boxes:
df=pd.DataFrame({'barcode':['1000']*3+['2000']*4+['3000']*3, 'box_number': ['10']*2+['11']+['12']*4+['13','14','15'],'serials': map(str,range(800,810))})
barcode box_number serials
0 1000 10 800
1 1000 10 801
2 1000 11 802
3 2000 12 803
4 2000 12 804
5 2000 12 805
6 2000 12 806
7 3000 13 807
8 3000 14 808
9 3000 15 809
I want to group them hierarchically to output to hierarchical XML, so that every barcode has a list of box numbers which each have list of serials in them.
So I did a groupby which seems to do exactly what I want:
df.groupby(['barcode','box_number'])['serials'].apply(' '.join)
barcode box_number
1000 10 800 801
11 802
2000 12 803 804 805 806
3000 13 807
14 808
15 809
Name: serials, dtype: object
Now, I want to extract this info practically the way it is displayed so that I get a row for each barcode with data grouped similar to this:
row['1000']== {'10': '800 801','11':'802'}
row['2000']== {'12': '803 804 805 806'}
row['3000']== {'13': '807','14':'808','15':'809' }
But I can't seem to figure out how to get this done. I tried reset_index(), another groupby() -- but this doesn't work on existing result as it is a Series, but I can't seem to be able to understand the right way.
How should I this most concisely? I looked over questions here, but didn't seem to find similar issue.
Use dictionary comrehension for get nested dictonary with Series.xs and Series.to_dict:
s = df.groupby(['barcode','box_number'])['serials'].apply(' '.join)
d = {lev: s.xs(lev).to_dict() for lev in s.index.levels[0]}
print (d)
{'1000': {'10': '800 801', '11': '802'},
'2000': {'12': '803 804 805 806'},
'3000': {'13': '807', '14': '808', '15': '809'}}

Smoothed Average over rows and columns with pandas

I am trying to create a function that averages over both row and column. For example:
**State** **1943 1944 1945 1946 1947 (1947_AVG) 1948 (1948_AVG)**
Alaska 1 2 3 4 5 2 6 3
CA 234 234 234 6677 34
I want a code that will give me an average for 1947 using 1943, 1944, and 1945. Something that gives me 1948 using 1944, 1945, 1946, ect, ect.
I currently have:
d3['pandas_SMA_Year'] = d3.iloc[:,1].rolling(window=3).mean()
But this is simply working over the rows, not the columns, and it doesn't take into account the fact that I'm looking 2 years back. Please and thank you for any guidance!

Pandas doesn't split EIA API Data into two different columsn for easy access

I am importing EIA data which contains weekly storage data. The first column in the reported week and second is storage.
When I import the data it shows two columns. First column has no title and second one as following title "Weekly Lower 48 States Natural Gas Working Underground Storage, Weekly (Billion Cubic Feet)".
I would like to plot the data using matplotlib but I need to separate the columns first. I used df.iloc[100:,:0] and this gives the first column which is the week but I somehow cannot separate the second column.
import eia
import pandas as pd
import os
api_key = "mykey"
api = eia.API(api_key)
series_search = api.data_by_series(series='NG.NW2_EPG0_SWO_R48_BCF.W')
df = pd.DataFrame(series_search)
df1 = df.iloc[100:,:0]
Code Output
This output is sample of all 486 rows. When I use df.shape command it shows as (486, 1) when it should show (486, 2 )
2010 0101 01 3117
2010 0108 08 2850
2010 0115 15 2607
2010 0122 22 2521
2019 0322 22 1107
2019 0329 29 1130
2019 0405 05 1155
2019 0412 12 1247
2019 0419 19 1339
You can first cut the last 3 characters of the string and then convert it to datetime:
df['Date'] = pd.to_datetime(df['Date'].str[:-3], format='%Y %m%d')
print(df)
Date Value
0 2010-01-01 3117
1 2010-01-08 2850
2 2010-01-15 2607
3 2010-01-22 2521
4 2019-03-22 1107
5 2019-03-29 1130
6 2019-04-05 1155
7 2019-04-12 1247
8 2019-04-19 1339

SQL Query: How to pull counts of two coulmns from respective tables

Given two tables:
1st Table Name: FACETS_Business_NPI_Provider
Buss_ID NPI Bussiness_Desc
11 222 Eleven 222
12 223 Twelve 223
13 224 Thirteen 224
14 225 Fourteen 225
11 226 Eleven 226
12 227 Tweleve 227
12 228 Tweleve 228
2nd Table : FACETS_PROVIDERs_Practitioners
NPI PRAC_NO PROV_NAME PRAC_NAME
222 943 P222 PR943
222 942 P222 PR942
223 931 P223 PR931
224 932 P224 PR932
224 933 P224 PR933
226 950 P226 PR950
227 951 P227 PR951
228 952 P228 PR952
228 953 P228 PR953
With below query I'm getting following results whereas it is expected to have the provider counts from table FACETS_Business_NPI_Provider (i.e. 3 instead of 4 for Buss_Id 12 and 2 instead of 3 for Buss_Id 11, etc).
SELECT BP.Buss_ID,
COUNT(BP.NPI) PROVIDER_COUNT,
COUNT(PP.PRAC_NO)PRACTITIONER_COUNT
FROM FACETS_Business_NPI_Provider BP
LEFT JOIN FACETS_PROVIDERs_Practitioners PP
ON PP.NOI=BP.NPI
group by BP.Buss_ID
Buss_ID PROVIDER_COUNT PRACTITIONER_COUNT
11 3 3
12 4 4
13 2 2
14 1 0
If I understood it correctly, you might want to add a DISTINCT clause to the columns.
Here is an SQL Fiddle, which we can probably use to discuss further.
http://sqlfiddle.com/#!2/d9a0e6/3