Replace xticks with names - matplotlib

I am working on the Spotify dataset from Kaggle. I plotted a barplot showing the top artists with most songs in the dataframe.
But the X-axis is showing numbers and I want to show names of the Artists.
names = list(df1['artist'][0:19])
plt.figure(figsize=(8,4))
plt.xlabel("Artists")
sns.barplot(x=np.arange(1,20),
y=df1['song_title'][0:19]);
I tried both list and Series object type but both are giving error.
How to replace the numbers in xticks with names?

Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Data
Data from Spotify - All Time Top 2000s Mega Dataset
df = pd.read_csv('Spotify-2000.csv')
titles = pd.DataFrame(df.groupby(['Artist'])['Title'].count()).reset_index().sort_values(['Title'], ascending=False).reset_index(drop=True)
titles.rename(columns={'Title': 'Title Count'}, inplace=True)
# titles.head()
Artist Title Count
Queen 37
The Beatles 36
Coldplay 27
U2 26
The Rolling Stones 24
Plot
plt.figure(figsize=(8, 4))
chart = sns.barplot(x=titles.Artist[0:19], y=titles['Title Count'][0:19])
chart.set_xticklabels(chart.get_xticklabels(), rotation=90)
plt.show()

OK, so I didnt know this, although now it seems stupid not to do so in hindsight!
Pass names(or string labels) in the argument for X-axis.
use plt.xticks(rotate=90) so the labels don't overlap

Related

How can I plot a map of a specfic country using plotly

Using geopandas and matplotlib I have ploted a map of india showing Air Quality Index.
The link to my data is:
https://drive.google.com/file/d/1-xihM-LCB6dNfONbK28CJWOP_PVgXA8C/view?usp=share_link
I want to plot an interactive map with names of the cities and borders of regions of India using plotly?
from matplotlib import cm, colors
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
#restricted to India.
ax = world[world.name == 'India'].plot(color='grey', edgecolor='white')
city_day_gdf.plot(column='AQI_Bucket', ax=ax, cmap='PuBuGn', markersize=city_day_gdf['AQI'])
norm = colors.Normalize(city_day_gdf.AQI.min(), city_day_gdf.AQI.max())
plt.colorbar(cm.ScalarMappable(norm=norm,cmap='PuBuGn'), ax=ax)
plt.title("A Map showing the descriptions of Air Quality Index in terms of AQI magnitude across India between 2015 and 2020")
plt.show()
you can scatter your data using px.scatter_mapbox(). Have reduced data to just newest reading per city. You have not stated how you want to treat time series
with requirement for plotting regions of India, you need some geometry for this. Have used GeoJson from a repo on GitHub for this. Have simplified the geometry as it is very detailed and hence will slow down plotly significantly
finally pull it all together, layer scatter on top of choropleth
import pandas as pd
import geopandas as gpd
import shapely.wkt
import plotly.express as px
import requests
url = "https://drive.google.com/file/d/1-xihM-LCB6dNfONbK28CJWOP_PVgXA8C/view?usp=share_link"
df = pd.read_csv("https://drive.google.com/uc?id=" + url.split("/")[-2], index_col=0)
df["Date"] = pd.to_datetime(df["Date"])
# just reduce data to last date for each city. Nothing in question indicates
# how to dela with time series
df = df.sort_values(["City", "Date"]).groupby("City", as_index=False).last()
# get some geometry for regions of india
gdf_region = gpd.read_file(
"https://raw.githubusercontent.com/Subhash9325/GeoJson-Data-of-Indian-States/master/Indian_States"
)
gdf_region["geometry"] = (
gdf_region.to_crs(gdf_region.estimate_utm_crs())
.simplify(5000)
.to_crs(gdf_region.crs)
)
fig = px.choropleth_mapbox(
gdf_region,
geojson=gdf_region.__geo_interface__,
locations=gdf_region.index,
color="NAME_1",
).update_traces(showlegend=False)
fig.add_traces(
px.scatter_mapbox(
df, lat="Lat", lon="Lon", color="AQI_Bucket", hover_data=["City"]
).data
)
fig.update_layout(
mapbox=dict(
style="carto-positron",
zoom=3,
center=dict(lat=df["Lat"].mean(), lon=df["Lon"].mean()),
)
)

How to change legend labels in scatter matrix

I have a scatter matrix that I want to change the labels for. On the right-hand, I want to change the blue color 1 to Say Mystery and the red color 2 to say Science. I also want to change the labels of each graph to label their counterpart [Spicy, Savory, and Sweet]. I tried using dict to relabel but then my charts came out wrong.
import plotly.express as px
fig = px.scatter_matrix(df,
dimensions=["Q12_Spicy", "Q12_Sav", "Q12_Sweet", ],color="Q11_Ans"
)
fig.show()
You can create a new column called Q11_Labels that maps 1 to Mystery and 2 to Science from the Q11_Ans column, and pass colors='Q11_Labels' to the px.scatter_matrix function. If you still want the legend to display the original column name, you can pass a dictionary to the labels parameter of the px.scatter_matrix function with labels={"Q11_Labels":"Q11_Ans"}
Then you can extend this dictionary to include the other column name to display name mappings as well, so that [Spicy, Savory, Sweet] are displayed instead of [Q12_Spicy, Q12_Savory, Q12_Sweet].
import numpy as np
import pandas as pd
import plotly.express as px
## recreate random data with the same columns
np.random.seed(42)
df = pd.DataFrame(
np.random.randint(0,100,size=(100, 3)),
columns=["Q12_Spicy", "Q12_Sav", "Q12_Sweet"]
)
df["Q11_Ans"] = np.random.randint(1,3,size=100)
df["Q11_Ans"] = df["Q11_Ans"].astype("category")
df = df.sort_values(by="Q11_Ans")
## remap the values of 1 and 2 to their meanings, then pass this as the color
df["Q11_Labels"] = df["Q11_Ans"].map({1: "Mystery", 2: "Science"})
## pass a dictionary to the labels parameter
fig = px.scatter_matrix(df,
dimensions=["Q12_Spicy", "Q12_Sav", "Q12_Sweet"],color="Q11_Labels",
labels = {"Q12_Spicy":"Spicy","Q12_Sav":"Savory","Q12_Sweet":"Sweet", "Q11_Labels":"Q11_Ans"}
)
fig.show()

how to plot graded letters like A* in matplotlib

i'm a complete beginner and i have a college stats project, im comparing exam scores for our year group and the one below. i collected my own data and since i do cs i decided to try visualize the data with pandas and matplotlib (my first time). i was able to read the csv file into a dataframe with columns = Level,Grade,Difficulty,Happy,MAG. Level is just ' year group ' e.g. AS or A2. and MAG is like a minimum expected grade, the rest are numeric values out of 5.
i want to do some type of plotting but i cant' seem to get it work.
i want to plot revision against difficulty? for AS group and try show a correlation. i also want to show a barchart ( if appropriate ) for Grade Vs MAG.
here is the csv https://docs.google.com/spreadsheets/d/169UKfcet1qh8ld-eI7B4U14HIl7pvgZfQLE45NrleX8/edit?usp=sharing
this is the code so far:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('Report Task.csv')
df.columns = ['Level','Grade','Difficulty','Revision','Happy','MAG'] #numerical values are out of 5
df[df.Level.str.match('AS')] #to get only AS group
plt.plot(df.Revision, df.Difficulty)
this is my first time ever posting on stack so im really sorry if i did something wrong.
For difficulty vs revision, you were using a line plot. You're probably looking for a scatter plot:
df = df[df.Level.str.match('AS')] # note the extra `df =` as per comments
plt.scatter(x=df.Revision, y=df.Difficulty)
plt.xlabel('Revision')
plt.ylabel('Difficulty')
Alternatively you can plot via pandas directly:
df = df[df.Level.str.match('AS')] # note the extra `df =` as per comments
df.plot.scatter(x='Revision', y='Difficulty')

Stacked barplot in pandas- read from dataframe?

I am trying to create a stacked barplot using a data frame I have created that
looks like this
I want the stacked bar chart to show the 'types of exploitation' on the x axis, and then the male and female figures stacked on top of each other under these headings.
Is there a way to do this reading the info from my df? I have read about creating an index to do this but do not understand if this is the solution?
I also need a legend showing 'male' and 'female'
You can stack bars on top of eachother by the bottom function in matplotlib package.
Step 1: Create dataframe and import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
d = {'male': [37,1032,1], 'female': [96,134,1]}
df = pd.DataFrame(data=d, index=['a', 'b', 'c'])
Step 2: Create graph
r = [0,1,2]
bars1 = df['female']
bars2 = df['male']
plt.bar(r, bars1)
plt.bar(r, bars2,bottom=bars1, color='#557f2d')
plt.xticks(r, df.index, fontweight='bold')
plt.legend(labels = ['female', 'male'])
plt.show()
More information could be found on this webpage: Link

How do I connect two sets of XY scatter values in MatPlotLib?

I am using MatLibPlot to fetch data from an excel file and to create a scatter plot.
Here is a minimal sample table
In my scatter plot, I have two sets of XY values. In both sets, my X values are country population. I have Renewable Energy Consumed as my Y value in one set and Non-Renewable Energy Consumed in the other set.
For each Country, I would like to have a line from the renewable point to the non-renewable point.
My example code is as follows
import pandas as pd
import matplotlib.pyplot as plt
excel_file = 'example_graphs.xlsx'
datasheet = pd.read_excel(excel_file, sheet_name=0, index_col=0)
ax = datasheet.plot.scatter("Xcol","Y1col",c="b",label="set_one")
datasheet.scatter("Xcol","Y2col",c="r",label="set_two", ax=ax)
ax.show()
And it produces the following plot
I would love to be able to draw a line between the two sets of points, preferably a line I can change the thickness and color of.
As commented, you could simply loop over the dataframe and plot a line for each row.
import pandas as pd
import matplotlib.pyplot as plt
datasheet = pd.DataFrame({"Xcol" : [1,2,3],
"Y1col" : [25,50,75],
"Y2col" : [75,50,25]})
ax = datasheet.plot.scatter("Xcol","Y1col",c="b",label="set_one")
datasheet.plot.scatter("Xcol","Y2col",c="r",label="set_two", ax=ax)
for n,row in datasheet.iterrows():
ax.plot([row["Xcol"]]*2,row[["Y1col", "Y2col"]], color="limegreen", lw=3, zorder=0)
plt.show()