How do I create a data frame after using groupby.tail - pandas

I want to create a plot on terrorist groups within my data, I have grouped and used.tail however I cannot plot the data
-<bound method Series.count of terrorist_group
-Revolutionary Armed Forces of Colombia (FARC) -105
-Al-Qaida in the Arabian Peninsula (AQAP) -142
-Tehrik-i-Taliban Pakistan (TTP) -157
-Maoists -165
-New People's Army (NPA) -210
-Boko Haram -234
-Al-Shabaab -325
-Islamic State of Iraq and the Levant (ISIL) -374
-Taliban -775
-Unknown -7973
-dtype: int64>
terrorgroup=year2013.groupby("terrorist_group").size().sort_values(inplace=False)
gtf=terrorgroup.tail(10).count
gtf

Related

creating legend for US cities map using basemap and matplotlib

I have a large dataframe with 5 columns, State, City , Count, latitude, longitude. I am trying to create a geographical map using basemap with 50 US State and cities, in this map the count value is shown by the red circle. The size of circle indicate the counts value.
Here is a sample of dataframe :
city
state
count
latitude
longitude
BROOKLYN
NY
831
40.649188
-73.933724
NEW YORK
NY
646
40.734332
-74.010112
CHICAGO
IL
614
41.850100
-87.650000
HOUSTON
TX
530
29.741797
-95.309376
BRONX
NY
415
40.816461
-73.862173
MIAMI
FL
401
25.752956
-80.271061
PHOENIX
AZ
382
33.859694
-112.115872
DALLAS
TX
311
32.902156
-96.794543
SAN ANTONIO
TX
259
29.518456
-98.60973
ANCHORAGE
AK
20
61.189063
-149.886241
HONOLULU
HI
56
21.271982
-157.821362
PONCE
PR
61
17.987655
-66.623600
I am trying to add legend that shows the count level on lower side of the map with red circle.
This is my code:
plt.figure(figsize=[12,18])
base_map = Basemap(llcrnrlon=-119,llcrnrlat=22,urcrnrlon=-64,urcrnrlat=49,
projection='lcc',lat_1=32,lat_2=45,lon_0=-95)
# load the shapefile, use the name 'states'
#base_map.shadedrelief()
base_map.readshapefile('st99_d00', name='states', drawbounds=True)
# Get the location of each city and plot it
state_set = set(['AK','HI','PR'])
for lat, long, name, size in zip(df['latitude'].tolist(),
df'longitude'].tolist(),
df['state'].tolist(),
df['count'].tolist()):
x, y = base_map(long,lat)#lat, long)
base_map.plot(x,y,marker='o',color='Red',markersize=size/20)#label=name)
if name not in state_set:
state_set.add(name)
plt.text(x, y, name,fontsize=11, color='k')
plt.title('****')
#plt.legend()
plt.show()
I am adding this part to legend section.This code works but the legend color are gray, they are very compact not easy to read. how can I change color and make them smaller or spread out?
# make legend with dummy points
for a in [100, 300, 500,700,850]:
plt.scatter([], [], c='k', alpha=0.5, s=a,
label=str(a) + ' counts')
plt.legend(scatterpoints=1, frameon=False,
labelspacing=1, loc='lower left')
Is this right way to add the legend ?How can I adjust the legend?
I am stuck and pretty confused, I appreciate any help and feedback.
TIA!

Aggregate data based on values appearing in two columns interchangeably?

home_team_name away_team_name home_ppg_per_odds_pre_game away_ppg_per_odds_pre_game
0 Manchester United Tottenham Hotspur 3.310000 4.840000
1 AFC Bournemouth Aston Villa 0.666667 3.230000
2 Norwich City Crystal Palace 0.666667 13.820000
3 Leicester City Sunderland 4.733333 3.330000
4 Everton Watford 0.583333 2.386667
5 Chelsea Manchester United 1.890000 3.330000
The home_ppg_per_odds_pre_game and away_ppg_per_odds_pre_game are basically the same metric. The former reprsents the value of this metric for the home_team, while the latter represents this metric for the away team. I want a mean of this metric for each team and that is regardless whether the team is playing home or away. In the example df you Manchester United as home_team_name in zero and as away_team_name in 5. I want the mean for Manchester United that includes all this examples.
df.groupby("home_team_name")["home_ppg_per_odds_pre_game"].mean()
This will only bring me the mean for the occasion when the team is playing home, but I want both home and away.
Since the two metrics are the same, you can append the home and away team metrics, like this:
data_df = pd.concat([df.loc[:,('home_team_name','home_ppg_per_odds_pre_game')], df.loc[:,('away_team_name','away_ppg_per_odds_pre_game')].rename(columns={'away_team_name':'home_team_name','away_ppg_per_odds_pre_game':'home_ppg_per_odds_pre_game'})])
Then you can use groupby to get the means:
data_df.groupby('home_team_name')['home_ppg_per_odds_pre_game'].mean().reset_index()

Unable to create new features in Machine learning

I have a dataset. I am using pandas dataframe and named it df.
The dataset has 50,000 rows - here are the first 5:.
Name_Restaurant cuisines_available Average cost
Food Heart Japnese, chinese 60$
Spice n Hungary Indian, American, mexican 42$
kfc, Lukestreet Thai, Japnese 29$
Brown bread shop American 11$
kfc, Hypert mall Thai, Japnese 40$
I want to create column which contains the no. of cuisines available
I am trying code
df['no._of_cuisines_available']=df['cuisines_available'].str.len()
Then instead of showing the no. of cuisines, it is showing the sum of charecters.
For example - for first row the o/p should be 2 , but its showing 17.
I need a new column that contain number of stores for each restaurant. example -
here kfc has 2 stores kfc, lukestreet and kfc, hypert mall. I have completely
no idea how to code this.
i)
df['cuisines_available'].str.split(',').apply(len)
ii)
df['Name_Restaurant'].str.split(',', expand=True).melt().['value'].str.strip().value_counts()
What ii) does: split columns at ',' and store all strings thus generated in an individual column. Then use melt to make one big column, strip away spaces etc. and count individual entries.

How to find and replace between source and destination files (worksheets) in Excel?

I have an Excel file of state roster, which looks like this:
Abbreviation Full
AL Alabama
AK Alaska
AZ Arizona
CA California
Then there's a file of state temperature like this:
State Temperature
AK 92
AZ 128
CA 109
So there are states in roster but not in the temperature file (AL, in this case).
How can I replace the abbreviations in the temperature file with the full names in an automating manner (e.g., a VBA or macro script)? The new temperature file will look like:
State Temperature
Alaska 92
Arizona 128
Florida 109
As an expanded consideration, will there be a difference in the programming if now there are states in the temperature file but not in the roster file?
You could use a formula in the NewTemperature sheet, starting in cell B2 and copy down. No VBA required.
=index(Temperature!$B:$B,match(index(StateRoster!$A:$A,Match(A2,StateRoster!$B:$B,0)),Temperature!$A:$A,0))

I need to get the min, max, avg of an Array List using Java

The values come from a Junit test case or from a scanner, so the method has to work in all scenarios. The array List looks something like this.
Utah 5
Nevada 6
California 12
Oregon 8
Utah 9
California 10
Nevada 4
Nevada 4
Oregon 17
California 6
I need to be able to calculate the average of, let's say, Utah. I know how to do something that finds the average. My biggest problem is knowing how to only get values from the names of Utah rather than just getting all of them.
names=states values= numbers categories=what
you are calculating
here is the start of the method:
public static ArrayList<Double> summarizeData (ArrayList<String> names, ArrayList<Double> values ,ArrayList<String> categories, int operation)