Plotly base values are in percentage - pandas

I have table in which one my base values are in percentage
ID TYPE PERCENTAGE
1 gold 15%
2 silver 71.4%
3 platinum 20%
4 copper 88.88%
But plotly doesn't like that
Do you know how I could tell him "hey these data are in percentage, please show me a percentage graph"?

I think plotly is the required answer, so I created it in Plotly. I have converted the percentages in the existing data frame to decimal format. Finally, I set the Y axis display to '%'.
import plotly.express as px
df['PERCENTAGE'] = df['PERCENTAGE'].apply(lambda x:float(str(x).strip('%')) / 100)
fig = px.bar(df, x='TYPE', y='PERCENTAGE')
fig.update_layout(yaxis_tickformat='%')
fig.show()

Does this work for you:
df.PERCENTAGE = df.PERCENTAGE.str.replace('%', '') #remove % sign
df.PERCENTAGE = pd.to_numeric(df.PERCENTAGE) #convert to numeric
plt.bar(df.TYPE, df.PERCENTAGE) #plot
plt.ylabel('Percentage')
plt.show()
Output:
Note you can always check the type of your data with df.dtypes

Related

Change the stacked bar chart to Stacked Percentage Bar Plot

How can I change this stacked bar into a stacked Percentage Bar Plot with percentage labels:
here is the code:
df_responses= pd.read_csv('https://raw.githubusercontent.com/eng-aomar/Security_in_practice/main/secuirtyInPractice.csv')
df_new =df_responses.iloc[:,9:21]
image_format = 'svg' # e.g .png, .svg, etc.
# initialize empty dataframe
df2 = pd.DataFrame()
# group by each column counting the size of each category values
for col in df_new:
grped = df_new.groupby(col).size()
grped = grped.rename(grped.index.name)
df2 = df2.merge(grped.to_frame(), how='outer', left_index=True, right_index=True)
# plot the merged dataframe
df2.plot.bar(stacked=True)
plt.show()
You can just calculate the percentages yourself e.g. in a new column of your dataframe as you do have the absolute values and plot this column instead.
Using sum() and division using dataframes you should get there quickly.
You might wanna have a look at GeeksForGeeks post which shows how this could be done.
EDIT
I have now gone ahead and adjusted your program so it will give the results that you want (at least the result I think you would like).
Two key functions that I used and you did not, are df.value_counts() and df.transpose(). You might wanna read on those two as they are quite helpful in many situations.
import pandas as pd
import matplotlib.pyplot as plt
df_responses= pd.read_csv('https://raw.githubusercontent.com/eng-aomar/Security_in_practice/main/secuirtyInPractice.csv')
df_new =df_responses.iloc[:,9:21]
image_format = 'svg' # e.g .png, .svg, etc.
# initialize empty dataframe providing the columns
df2 = pd.DataFrame(columns=df_new.columns)
# loop over all columns
for col in df_new.columns:
# counting occurences for each value can be done by value_counts()
val_counts = df_new[col].value_counts()
# replace nan values with 0
val_counts.fillna(0)
# calculate the sum of all categories
total = val_counts.sum()
# use value count for each category and divide it by the total count of all categories
# and multiply by 100 to get nice percent values
df2[col] = val_counts / total * 100
# columns and rows need to be transposed in order to get the result we want
df2.transpose().plot.bar(stacked=True)
plt.show()

How to plot coordinates from single pandas series

I have a pandas series called df1['geometry.coordinates'] of coordinate values in the following format:
geometry.coordinates
0 [150.792711, -34.210868]
1 [151.551228, -33.023339]
2 [148.92149870748742, -34.767207772932835]
3 [151.033742, -33.919998]
4 [150.953963043732, -32.3935017885229]
... ...
432 [114.8927165, -28.902492300000002]
433 [115.34601918477634, -30.041742290803096]
434 [115.4632611, -30.8581035]
435 [121.42151909999998, -30.7804027]
436 [115.69424934340425, -30.680970908597665]
I want to plot each point on a graph, probably through using a scatter plot.
I tried: df1['geometry.coordinates'].plot.scatter() but it gets confused because it only reads it as one list value rather than two and therefore I always get the following error:
TypeError: scatter() missing 2 required positional arguments: 'x' and 'y'
Anyone know how I can solve this?
You need to separate the column containing the list so that you can specify x and y in the plot call.
You can split a column containing a list by constructing a data frame from a list.
pd.DataFrame(df2["geometry.coordinates"].to_list(), columns=['x', 'y']).plot.scatter(x=“x”, y=“y”)
Step 1: Split array into multiple columns
df1[['x','y']] = pd.DataFrame(df1['geometry.coordinates'].tolist(), index= df1.index)
Step 2: Plot
df1.plot.scatter(x = 'x', y = 'y', s = 30) #s is size of dots
You are not giving the parameters to scatter(), so the error is quite logical. Something among the lines of df.scatter.plot(df[0],df[1]) should work.
Also, as you are working working with column vectors, you need to transpose your data for it to be viewed as rows: df.scatter.plot(df.T[0],df.T[1])
I did it this way.
import matplotlib.pyplot as plt
geometry = pd.Series([
[150.792711, -34.210868],
[151.551228, -33.023339],
[148.92149870748742, -34.767207772932835],
[151.033742, -33.919998],
[150.953963043732, -32.3935017885229]])
df = pd.DataFrame(geometry.to_list(), columns = ['x','y'])
plt.scatter(x = df['x'], y = df['y'],
edgecolor ='black')
plt.grid(alpha=.15)
you can try
import pandas as pd
geometry_coordinates=[[150.792711, -34.210868],
[151.551228, -33.023339],
[148.92149870748742, -34.767207772932835],
[151.033742, -33.919998],
[150.953963043732, -32.3935017885229],
[114.8927165, -28.902492300000002],
[115.34601918477634, -30.041742290803096],
[115.4632611, -30.8581035],
[121.42151909999998, -30.7804027],
[115.69424934340425, -30.680970908597665]]
geometry_coordinates=pd.DataFrame(geometry_coordinates,columns=['lat','long'])
geometry_coordinates.plot.scatter(x='lat',y='long')

How can i create a bubble chart using this data in seaborn?

i have all the data i need to plot in a single row e.g.:
mcc_name year_1 year_2 year_3 year_1_% year_2_% year_3_%
book shop 30000 1500.41 9006.77 NaN -0.4708 -0.60379
i want the x axis to be the values in columns: [year_1, year_2, year_3] and values in y axis to be the y - axis (pct change)... and the size of the bubble proportional to the values in [year_1, year_2, year_3] .
sns.scatterplot(data=data_row , x=['year_1', 'year_2', 'year_3'], y=['year_1_%', 'year_2_%', 'year_3_%'], size="pop", legend=False, sizes=(20, 2000))
# show the graph
plt.show()
but i get this error:
ValueError: Length of list vectors must match length of `data` when both are used, but `data` has length 1 and the vector passed to `y` has length 3.
how can i plot??
You need to have your data in long format:
import pandas as pd
import seaborn as sns
import numpy as np
df = pd.DataFrame(np.array([30000,1500.41,9006.77,np.NaN,-0.4708,-0.60379]).reshape(1,-1),
columns = ['year_1','year_2','year_3','year_1_%','year_2_%','year_3_%'],
index = ['mcc_name'])
Usually you can use wide_to_long if your columns are formatted properly, but in this case, maybe easily to melt separately and join:
values = df.filter(regex='year_[0-9]$', axis=1).melt(value_name="value",var_name="year")
perc = df.filter(regex='_%', axis=1).melt(value_name="perc",var_name="year")
perc.year = perc.year.str.replace("_%","")
sns.scatterplot(data=values.merge(perc,on="year"),x = "year", y = "perc", size = "value")

Remove thousands(k) in plotly line plot

I have plotly line plot with x axis year & week as integer. I do get correct data for year weeks 202101,202102,202103. In plot it shows as 202.1K,202.1K, 202.1K. I am looking for to show 202101,202102, 202103 in plot axis as well. Below is my code.
if chart_choice == 'line':
print(dff['week'])
dff = dff.groupby(['product','week'], as_index=False)[['CYSales']].sum()
fig = px.line(dff, x='week', y=num_value, color='product')
return fig
thanks for help
Does tickformat = "000" help perhaps? Have a look here Plotly for R: Remove the k that apears on the y axis when the dataset contains numbers larger than 1000

matplotlib: histogram is not displaying correctly

I have extracted certain data from a csv file contains the information I need to analyze. Made them into a DataFrame. Then group them based on the type of region they are at "reg."
datafileR = datafile = pd.read_csv("pixel_data.csv")
datafileR = pd.DataFrame(datafileR)
### Counting the number of each rows based on the "Reg":
datafileR["Reg"].value_counts()
This is the result I received:
enter image description here
Make a group called region based on the Reg column from dataframe: datafileR:
region = datafileR.groupby(["Reg"])
Now plot them in histogram:
sns.set_theme()
plt.hist(datafileR["Reg"].value_counts(), bins=[70,100,130,160,190],color=["grey"],
histtype='bar', align='mid', orientation='vertical', rwidth=0.85)
This is the image I received, but there should have five categories (Middle East and North Africa, Africa (excl MENA),Asia and Pacific, Europe and Eurasia and Cross-regional)on the x-axies. I am not sure what when wrong. Meanwhile, how to change the states on the y-axis so it displays the actual number?
enter image description here
You are trying to draw a bar plot, not a histogram. Please ref to https://matplotlib.org/api/_as_gen/matplotlib.pyplot.bar.html?highlight=bar#matplotlib.pyplot.bar
datafileR = pd.DataFrame({'reg': np.random.choice(['Asia','Africa','Europe'], size=1000)})
df = datafileR['reg'].value_counts()
plt.bar(x=df.index, height=df.values)
You can also use pandas' plotting functions:
df.plot.bar()
plt.tight_layout()