My dataframe is a list of football games with varying stats, around 300 entries.
game_id team opp_team avg_marks
0 2919 STK BL 122
1 2919 BL STK 114
2 2920 RICH SYD 135
3 2920 SYD RICH 108
I would like to add the opposition stats as a new column for each entry. Resultant dataframe would appear like this
game_id team opp_team avg_marks opp_avg_marks
0 2919 STK BL 122 114
1 2919 BL STK 114 122
2 2920 RICH SYD 135 108
3 2920 SYD RICH 108 135
Any suggestions would be most welcome, I'm new to this forum. I have tried mapping but the entry is conditional on 2 columns, game_id and opp_team.
Ideally I would add it in original spreadsheet but I created a cumulative average for the season in pandas so was hoping there would be a way to incorporate this as well.
You could group on game_id and reverse the avg_marks values
In [725]: df.groupby('game_id')['avg_marks'].transform(lambda x: x[::-1])
Out[725]:
0 114
1 122
2 108
3 135
Name: avg_marks, dtype: int64
In [726]: df['opp_avg_marks'] = (df.groupby('game_id')['avg_marks']
.transform(lambda x: x[::-1]))
In [727]: df
Out[727]:
game_id team opp_team avg_marks opp_avg_marks
0 2919 STK BL 122 114
1 2919 BL STK 114 122
2 2920 RICH SYD 135 108
3 2920 SYD RICH 108 135
Or, get dict mapping from team and avg_marks, then use map on opp_team
In [729]: df['opp_team'].map(df.set_index('team')['avg_marks'].to_dict())
Out[729]:
0 114
1 122
2 108
3 135
Name: opp_team, dtype: int64
Related
new to learning sql/postgresql and have been hunting all over looking for help with a query to find only the matching id values in a table so i can pull data from another table for a learning project. I have been trying to use the count command, which doesn't seem right, and struggling with group by.
here is my table
id acct_num sales_tot cust_id date_of_purchase
1 9001 1106.10 116 12-Jan-00
2 9002 645.22 125 13-Jan-00
3 9003 1096.18 137 14-Jan-00
4 9004 1482.16 118 15-Jan-00
5 9005 1132.88 141 16-Jan-00
6 9006 482.16 137 17-Jan-00
7 9007 1748.65 147 18-Jan-00
8 9008 3206.29 122 19-Jan-00
9 9009 1184.16 115 20-Jan-00
10 9010 2198.25 133 21-Jan-00
11 9011 769.22 141 22-Jan-00
12 9012 2639.17 117 23-Jan-00
13 9013 546.12 122 24-Jan-00
14 9014 3149.18 116 25-Jan-00
trying to write a simple query to only find matching customer id's, and export them to the query window.
I have a dataframe as below.
I wnat to do the groupby of "Cycle" & "Type". After the groupby is done, i want to perform several actions (sum, mean, var, std, rolling mean, lingress......) on the first 33%, middle 33% and end 33%. how do i do it?
With head() & tail() i can select only first & last few rows (that too if I know the numbers of rows i need & since length of each group varies, i do not know these values). So, can anyone guide?
Cycle Type Time Values
2 2 101 20.402
2 2 102 20.402
2 2 103 20.402
2 2 104 20.402
2 2 105 20.402
2 2 106 20.383
2 2 107 20.383
2 2 108 20.383
2 2 109 20.383
2 2 110 20.383
2 2 111 20.36
2 2 112 20.36
2 2 113 20.36
2 2 114 20.36
2 2 115 20.36
2 2 116 20.36
2 2 117 20.36
2 2 118 20.36
2 2 119 20.36
2 2 120 20.36
2 2 121 20.348
2 2 122 20.348
2 2 123 20.348
2 2 124 20.348
2 2 125 20.348
3 1 126 20.34
3 1 127 20.34
3 1 128 20.34
3 1 129 20.34
3 1 130 20.34
3 1 131 20.337
3 1 132 20.337
3 1 133 20.337
3 1 134 20.337
3 1 135 20.337
3 1 136 20.342
3 1 137 20.342
3 1 138 20.342
3 1 139 20.342
3 1 140 20.342
3 1 141 20.342
3 1 142 20.342
3 1 143 20.342
3 1 144 20.342
3 1 145 20.342
3 1 146 20.335
3 1 147 20.335
3 1 148 20.335
3 1 149 20.335
5 2 102 20.402
5 2 103 20.402
5 2 104 20.402
5 2 105 20.402
5 2 106 20.383
5 2 107 20.383
5 2 108 20.383
5 2 109 20.383
5 2 110 20.383
5 2 111 20.36
5 2 112 20.36
5 2 113 20.36
5 2 114 20.36
5 2 115 20.36
5 2 116 20.36
5 2 117 20.36
5 2 118 20.36
5 2 119 20.36
Update result achieved based on suggestion from Valenteno
Here is one way using cumcount and transform with floor division
g=df.groupby(['Cycle','Time'])
s=g.cumcount()//(g.Cycle.transform('count')//3).clip(upper=2)
df.groupby([df.Cycle,df.Time,s]).apply(Yourfunctionhere)
This should be close to what you want. Here I use only sum and mean, feel free to add other function to the agg argument list.
def sample(x):
aggrfunc = ['sum', 'mean']
first = x.iloc[0:len(x)//3].agg(aggrfunc)
middle = x.iloc[len(x)//3:2*len(x)//3].agg(aggrfunc)
last = x.iloc[2*len(x)//3:].agg(aggrfunc)
return pd.concat([first, middle, last], keys=['top 33%', 'middle 33%', 'bottom 33%']))
ddf = df.groupby(['Cycle', 'Type']).apply(sample)
Using your sample dataframe, this code produces ddf:
Cycle Type Time Values
Cycle Type
2 2 top 33% sum 16.0 16.0 836.0 163.159000
mean 2.0 2.0 104.5 20.394875
middle 33% sum 16.0 16.0 900.0 162.926000
mean 2.0 2.0 112.5 20.365750
bottom 33% sum 18.0 18.0 1089.0 183.180000
mean 2.0 2.0 121.0 20.353333
3 1 top 33% sum 24.0 8.0 1036.0 162.711000
mean 3.0 1.0 129.5 20.338875
middle 33% sum 24.0 8.0 1100.0 162.726000
mean 3.0 1.0 137.5 20.340750
bottom 33% sum 24.0 8.0 1164.0 162.708000
mean 3.0 1.0 145.5 20.338500
5 2 top 33% sum 30.0 12.0 627.0 122.374000
mean 5.0 2.0 104.5 20.395667
middle 33% sum 30.0 12.0 663.0 122.229000
mean 5.0 2.0 110.5 20.371500
bottom 33% sum 30.0 12.0 699.0 122.160000
mean 5.0 2.0 116.5 20.360000
I'm getting this error:
Error tokenizing data. C error: Expected 2 fields in line 11, saw 3
Code: import webbrowser
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
webbrowser.open(website)
league_frame = pd.read_clipboard()
And the above mentioned comes next.
I believe you need use read_html - returned all parsed tables and select Dataframe by position:
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
#select first parsed table
df1 = pd.read_html(website)[0]
print (df1.head())
Win % Wins Losses Year Team Comment
0 0.798 67 17 1882 Chicago White Stockings best pre-modern season
1 0.763 116 36 1906 Chicago Cubs best 154-game NL season
2 0.721 111 43 1954 Cleveland Indians best 154-game AL season
3 0.716 116 46 2001 Seattle Mariners best 162-game AL season
4 0.667 108 54 1975 Cincinnati Reds best 162-game NL season
#select second parsed table
df2 = pd.read_html(website)[1]
print (df2)
Win % Wins Losses Season Team \
0 0.890 73 9 2015–16 Golden State Warriors
1 0.110 9 73 1972–73 Philadelphia 76ers
2 0.106 7 59 2011–12 Charlotte Bobcats
Comment
0 best 82 game season
1 worst 82-game season
2 worst season statistically
I have the following text file which I'm trying to automate into a line Graph in excel.. which logs every 5 minutes up until from 08:00 till 18:00 so there is quite a few rows
TIME Rec-Created Rec-Deleted Rec-Updated Rec-read Rec-wait Committed bi- writes Bi-reads DB-Writes DB-READ db-access Checkpoints Flushed
08:09:00 37 0 5 21276 0 1894 33 3 109 43 47691 1 0
08:14:00 49 0 144 20378 0 1225 143 0 88 192 53377 0 0
08:19:00 44 0 237 19902 0 1545 283 6 317 120 49668 2 0
08:24:00 51 0 129 12570 0 626 191 3 164 58 37811 1 0
08:29:00 61 0 49 14138 0 541 86 3 116 77 36836 1 0
08:34:00 59 0 144 58536 0 1438 209 3 143 3753 135427 1 0
08:39:00 85 0 178 28309 0 1822 209 6 209 80 70950 2 0
08:44:00 57 0 157 17940 0 554 132 3 170 92 47561 1 0
08:49:00 115 0 217 29961 0 1867 186 3 333 193 76057 1 0
08:54:00 111 0 225 23320 0 540 198 6 275 246 64138 2 0
08:59:00 41 0 152 15638 0 359 187 3 368 103 44558 1 0
I'm not too concerned about the Line graph part but more the trying to get the data into excel in the correct format.
I'm assuming I would need to use an array, but currently that is little advanced for me at the moment as Im still trying to get to grips VB and this is really my first venture into this world...(as you can see from my previous post)
any help or guidance would be appreciated..
(im studying the VB for Dummies and Visual Basic Fundamentals: Development for Absolute Beginners from the channel9 MSDN)
thanks in advance
I would probably create typed dataset, with all of your columns. Lets call it YourDataset.
Then read the file in and add rows to your table for each row in the file. Not fully functional but an outline of a solution.
dim typedDataset = new YourDataset
Using reader As StreamReader = New StreamReader("file.txt")
line = reader.ReadLine
dim rowData = line.Split(" ")
'add a new row to typed dataset based on data above
End Using
That is how you would get your data into vb.net, it would be sitting in a table just like the excel table, at that point if you didn't care about excel you could use a graphing control like on this page. And see it with a datagrid view https://msdn.microsoft.com/en-us/library/dd489237(v=vs.140).aspx
But going to excel you would need to follow a guide like the one I have in the following link. You need to use the Microsoft.Office.Interop.Excel
http://www.codeproject.com/Tips/669509/How-to-Export-Data-to-Excel-in-VB-NET
Given two tables:
1st Table Name: FACETS_Business_NPI_Provider
Buss_ID NPI Bussiness_Desc
11 222 Eleven 222
12 223 Twelve 223
13 224 Thirteen 224
14 225 Fourteen 225
11 226 Eleven 226
12 227 Tweleve 227
12 228 Tweleve 228
2nd Table : FACETS_PROVIDERs_Practitioners
NPI PRAC_NO PROV_NAME PRAC_NAME
222 943 P222 PR943
222 942 P222 PR942
223 931 P223 PR931
224 932 P224 PR932
224 933 P224 PR933
226 950 P226 PR950
227 951 P227 PR951
228 952 P228 PR952
228 953 P228 PR953
With below query I'm getting following results whereas it is expected to have the provider counts from table FACETS_Business_NPI_Provider (i.e. 3 instead of 4 for Buss_Id 12 and 2 instead of 3 for Buss_Id 11, etc).
SELECT BP.Buss_ID,
COUNT(BP.NPI) PROVIDER_COUNT,
COUNT(PP.PRAC_NO)PRACTITIONER_COUNT
FROM FACETS_Business_NPI_Provider BP
LEFT JOIN FACETS_PROVIDERs_Practitioners PP
ON PP.NOI=BP.NPI
group by BP.Buss_ID
Buss_ID PROVIDER_COUNT PRACTITIONER_COUNT
11 3 3
12 4 4
13 2 2
14 1 0
If I understood it correctly, you might want to add a DISTINCT clause to the columns.
Here is an SQL Fiddle, which we can probably use to discuss further.
http://sqlfiddle.com/#!2/d9a0e6/3