Pandas extract hierarchical info? - pandas

I have a dataframe which describes serial numbers of items arranged in boxes:
df=pd.DataFrame({'barcode':['1000']*3+['2000']*4+['3000']*3, 'box_number': ['10']*2+['11']+['12']*4+['13','14','15'],'serials': map(str,range(800,810))})
barcode box_number serials
0 1000 10 800
1 1000 10 801
2 1000 11 802
3 2000 12 803
4 2000 12 804
5 2000 12 805
6 2000 12 806
7 3000 13 807
8 3000 14 808
9 3000 15 809
I want to group them hierarchically to output to hierarchical XML, so that every barcode has a list of box numbers which each have list of serials in them.
So I did a groupby which seems to do exactly what I want:
df.groupby(['barcode','box_number'])['serials'].apply(' '.join)
barcode box_number
1000 10 800 801
11 802
2000 12 803 804 805 806
3000 13 807
14 808
15 809
Name: serials, dtype: object
Now, I want to extract this info practically the way it is displayed so that I get a row for each barcode with data grouped similar to this:
row['1000']== {'10': '800 801','11':'802'}
row['2000']== {'12': '803 804 805 806'}
row['3000']== {'13': '807','14':'808','15':'809' }
But I can't seem to figure out how to get this done. I tried reset_index(), another groupby() -- but this doesn't work on existing result as it is a Series, but I can't seem to be able to understand the right way.
How should I this most concisely? I looked over questions here, but didn't seem to find similar issue.

Use dictionary comrehension for get nested dictonary with Series.xs and Series.to_dict:
s = df.groupby(['barcode','box_number'])['serials'].apply(' '.join)
d = {lev: s.xs(lev).to_dict() for lev in s.index.levels[0]}
print (d)
{'1000': {'10': '800 801', '11': '802'},
'2000': {'12': '803 804 805 806'},
'3000': {'13': '807', '14': '808', '15': '809'}}

Related

matching customer id value in postgresql

new to learning sql/postgresql and have been hunting all over looking for help with a query to find only the matching id values in a table so i can pull data from another table for a learning project. I have been trying to use the count command, which doesn't seem right, and struggling with group by.
here is my table
id acct_num sales_tot cust_id date_of_purchase
1 9001 1106.10 116 12-Jan-00
2 9002 645.22 125 13-Jan-00
3 9003 1096.18 137 14-Jan-00
4 9004 1482.16 118 15-Jan-00
5 9005 1132.88 141 16-Jan-00
6 9006 482.16 137 17-Jan-00
7 9007 1748.65 147 18-Jan-00
8 9008 3206.29 122 19-Jan-00
9 9009 1184.16 115 20-Jan-00
10 9010 2198.25 133 21-Jan-00
11 9011 769.22 141 22-Jan-00
12 9012 2639.17 117 23-Jan-00
13 9013 546.12 122 24-Jan-00
14 9014 3149.18 116 25-Jan-00
trying to write a simple query to only find matching customer id's, and export them to the query window.

Plotting Webscraped data onto matplotlib

I recently managed to collect tabular data from a PDF file using camelot in python. By collect I mean print it out on the terminal, Now i would like to find a way to automate the results into a bar graph diagram on matplotlib. how would i do that? Here's my code for extracting the tabular data from the pdf:
import camelot
tables = camelot.read_pdf("data_table.pdf", pages='2')
print(tables[0].df)
Here's an image of the table
enter image description here
Which then prints out a large table in my terminal:
0 1 2 3 4
0 Country \nCase definition \nCumulative cases \...
1 Guinea Confirmed 2727 156 1683
2 Probable 374 * 374
3 Suspected 7 * ‡
4 Total 3108 156 2057
5 Liberia** Confirmed 3149 11 ‡
6 Probable 1876 * ‡
7 Suspected 3982 * ‡
8 Total 9007 11 3900
9 Sierra Leone Confirmed 8212 230 3042
10 Probable 287 * 208
11 Suspected 2604 * 158
12 Total 11103 230 3408
13 Total 23 218 397 9365
I do have a bit of experience with matplotlib and i know how to plot data manually but not automatically from the pdf. This would save me some time since I'm trying to automate the whole process.

How to remove unwanted values in data when reading csv file

Reading Pina_Indian_Diabities.csv some of the values are strings, something like this
+AC0-5.4128147485
734 2
735 4
736 0
737 8
738 +AC0-5.4128147485
739 1
740 NaN
741 3
742 1
743 9
744 13
745 12
746 1
747 1
like in row 738, there re such values in other rows and columns as well.
How can I drop them?

Generate Seaborn Countplot using column value as count

For the following table
count_value
CPUCore Offline_RetentionAge
i7 183 4184
7 1981
30 471
i5 183 2327
7 831
30 250
Pentium 183 333
7 125
30 43
2 183 575
7 236
31 96
Is it possible to generate a seaborn countplot (or normal countplot) like the following (generated using sns.countplot(x='CPUCore', hue="Offline_BackupSchemaIncrementType", data=dfCombined_df))
Problem here is that I need to use the count_value as count, rather then really go and count the Offline_RetentionAge
I think you need seaborn.barplot:
sns.barplot(x="count_value", y="index", hue='Offline_RetentionAge', data=df.reset_index())

SQL Query: How to pull counts of two coulmns from respective tables

Given two tables:
1st Table Name: FACETS_Business_NPI_Provider
Buss_ID NPI Bussiness_Desc
11 222 Eleven 222
12 223 Twelve 223
13 224 Thirteen 224
14 225 Fourteen 225
11 226 Eleven 226
12 227 Tweleve 227
12 228 Tweleve 228
2nd Table : FACETS_PROVIDERs_Practitioners
NPI PRAC_NO PROV_NAME PRAC_NAME
222 943 P222 PR943
222 942 P222 PR942
223 931 P223 PR931
224 932 P224 PR932
224 933 P224 PR933
226 950 P226 PR950
227 951 P227 PR951
228 952 P228 PR952
228 953 P228 PR953
With below query I'm getting following results whereas it is expected to have the provider counts from table FACETS_Business_NPI_Provider (i.e. 3 instead of 4 for Buss_Id 12 and 2 instead of 3 for Buss_Id 11, etc).
SELECT BP.Buss_ID,
COUNT(BP.NPI) PROVIDER_COUNT,
COUNT(PP.PRAC_NO)PRACTITIONER_COUNT
FROM FACETS_Business_NPI_Provider BP
LEFT JOIN FACETS_PROVIDERs_Practitioners PP
ON PP.NOI=BP.NPI
group by BP.Buss_ID
Buss_ID PROVIDER_COUNT PRACTITIONER_COUNT
11 3 3
12 4 4
13 2 2
14 1 0
If I understood it correctly, you might want to add a DISTINCT clause to the columns.
Here is an SQL Fiddle, which we can probably use to discuss further.
http://sqlfiddle.com/#!2/d9a0e6/3