Rendering Plot in Flexdashboard with Shiny - ggplot2

I'm trying to render a plot in a flexdashboard that is incorporating elements of Shiny. When I pass my plot through renderPlotly(), the plot generates in my output, but the axis cuts off, and the bars do not display. This is the code I'm using to render the plot:
renderPlotly({
income_dat %>%
drop_na(q41_what_is_the_estimated_yearly_income_of_your_household) %>%
ggplot(aes(q41_what_is_the_estimated_yearly_income_of_your_household)) +
geom_bar(fill = "#A1A75F") +
coord_flip() +
facet_wrap(input$demo1) +
labs(x = "",
y = "Total")
})
Conversely, when I use renderPlot() instead of renderPlotly, the plot does render, but it is extremely small and fixed to the top left corner of the plot area on the dashboard.
Here's a segment of my data:
head(income_dat)
# A tibble: 6 × 6
response_id q1_in_which_coun… q2_are_you_native_… q3_what_is_your… q41_what_is_the_… gender_id
<fct> <fct> <fct> <fct> <fct> <fct>
1 39 Honolulu No 55 + Prefer not to an… Female
2 40 Maui No 55 + $125,000 - less … Female
3 41 I don't live in … No 25 - 34 NA NA
4 42 Honolulu Yes 35 - 44 $105,000 - less … Male
5 43 Kaua'i No 55 + $65,000 - less t… Male
6 44 Maui No 55 + Prefer not to an… Male
The variables for county, gender identity, age group, and Hawaiian native status are the variables I'm trying to pass as my shiny inputs with input$demo1, and the income variable is what I'm trying to plot on geom_bar, where the discrete response options of income brackets will populate the axis.
Any input on how to render this plot correctly, with either renderPlotly() or renderPlot() would be appreciated.

Related

Set major Xtick with datetime column type

enter image description here
I want to clean some matplotlib image, removing the year from every x beside the January ticks.
My date column is in datetime format from pandas
this is my code:
df_ref1 = data.groupby(['ref_date'])['fuel'].value_counts().unstack()
fig, ax = plt.subplots(figsize=(6,4), dpi=200);
df_ref1.plot(ax=ax, kind='bar', stacked=True)
ax.set_title('Distribuição de tipo de combustível')
ax.spines[['top','right', 'left']].set_visible(False)
ax.set_xlabel('')
plt.xticks(rotation = 45,ha='right');
ax.legend(['Gasolina','Disel', 'Álcool'],bbox_to_anchor=(.85, .95, 0, 0), shadow=False, frameon=False)
plt.tight_layout()
I tried use:
ax.xaxis.set_major_formatter('%Y/%m')
ax.xaxis.set_minor_formatter('%m')
Tried using:
ax.xaxis.set_major_formatter(DateFormatter('%Y/%m'))
ax.xaxis.set_minor_formatter(DateFormatter('%m'))
but the xticks turn to 1970/01
Without success. Any tip?
One way to do this would be to first truncate the labels to show just year and month and then show only every nth (12th in your case) label, so that you see the first month of each year. Hope the dates are in datetime format.
The data I used...
Date Petrol Diesel Alcohol
0 2021-01-01 1975 320 30
1 2021-02-01 1976 321 31
2 2021-03-01 1977 322 32
3 2021-04-01 1978 323 33
....
22 2022-11-01 1997 342 52
23 2022-12-01 1998 343 53
24 2023-01-01 1999 344 54
The updated code...
df_ref1=df_ref1.set_index('Date', drop=True) ## Set it so dates are in index
##Your code
fig, ax = plt.subplots(figsize=(6,4), dpi=200);
ax.set_title('Distribuição de tipo de combustível')
#ax.spines[['top','right', 'left']].set_visible(False)
ax.set_xlabel('')
df_ref1.plot(ax=ax, kind='bar', stacked=True)
plt.xticks(rotation = 45,ha='right')
ax.legend(['Gasolina','Disel', 'Álcool'],bbox_to_anchor=(.85, .95, 0, 0), shadow=False, frameon=False)
## Get the labels using get_text() and set it to show first 7 characters
labels = [item.get_text() for item in ax.get_xticklabels()]
labels = [x[:7] for x in labels]
ax.set_xticklabels(labels)
## Show only the 1st, 13th, ... label
every_nth = 12
for n, label in enumerate(ax.xaxis.get_ticklabels()):
if n % every_nth != 0:
label.set_visible(False)
Output plot
Update:
To show the ticklabels as per comment (Year-Month for Jan and just month for others), use something similar - get the ticklabels, cut the string to first 7 for Jan and 5th to 7th for the other months (two chars only) and display the same... Updated code and plot
df_ref1=df_ref1.set_index('Date', drop=True)
fig, ax = plt.subplots(figsize=(6,4), dpi=200);
ax.set_title('Distribuição de tipo de combustível')
ax.set_xlabel('')
df_ref1.plot(ax=ax, kind='bar', stacked=True)
plt.xticks(rotation = 45, ha='right')
ax.legend(['Gasolina','Disel', 'Álcool'],bbox_to_anchor=(.85, .95, 0, 0), shadow=False, frameon=False)
## Get the labels using get_text()
labels = [item.get_text() for item in ax.get_xticklabels()]
## Show the correct chars based on Jan or not not-Jan
every_nth = 12
for n in range(len(labels)):
if n % every_nth == 0:
labels[n]=labels[n][:7] ## Is Jan - show 7 chars
else:
labels[n]=labels[n][5:7] ## Is NOT Jan - show 5th to 7th chars
## Update labels
ax.set_xticklabels(labels)

create seaborn facetgrid based on crosstab table

my dataframe is structured:
ACTIVITY 2014 2015 2016 2017 2018
WALK 198 501 485 394 461
RUN 187 446 413 371 495
JUMP 45 97 88 103 78
JOG 1125 2150 2482 2140 2734
SLIDE 1156 2357 2530 2044 1956
my visualization goal: facetgrid of bar charts showing the percentage points over time, with each bar either positive/negative depending on percentage change of the year of course. each facet is an INCIDENT type, if that makes sense. for example, one facet would be a barplot of WALK, the other would be RUN, and so on and so forth. x-axis would be time of course (2014, 2015, 2016, etc) and y-axis would be the value (%change) from each year.
in my analysis, i added pct_change columns for every year except the baseline 2014 using simple pct_change() function that takes in two columns from df and spits back out a new calculated column:
df['%change_2015'] = pct_change(df['2014'],df['2015'])
df['%change_2016'] = pct_change(df['2015'],df['2016'])
... etc.
so with these new columns, i think i have the elements i need for my data visualization goal. how can i do it with seaborn facetgrids? specifically bar plots?
augmented dataframe (slice view):
ACTIVITY 2014 2015 2016 2017 2018 %change_2015 %change_2016
WALK 198 501 485 394 461 153.03 -3.19
RUN 187 446 413 371 495 xyz xyz
JUMP 45 97 88 103 78 xyz xyz
i tried reading through the seaborn documentation but i was having trouble understanding the configurations: https://seaborn.pydata.org/generated/seaborn.FacetGrid.html
is the problem the way my dataframe is ordered and structured? i hope all of that made sense. i appreciate any help with this.
Use:
import pandas as pd
cols = 'ACTIVITY', '%change_2015', '%change_2016'
data = [['Jump', '10.1', '-3.19'],['Run', '9.35', '-3.19'], ['Run', '4.35', '-1.19']]
df = pd.DataFrame(data, columns = cols)
dfm = pd.melt(df, id_vars=['ACTIVITY'], value_vars=['%change_2015', '%change_2016'])
dfm['value'] = dfm['value'].astype(float)
import seaborn as sns
g = sns.FacetGrid(dfm, col='variable')
g.map(sns.barplot, 'ACTIVITY', "value")
Output:
Based on your comment:
g = sns.FacetGrid(dfm, col='ACTIVITY')
g.map(sns.barplot, 'variable', "value")
Output:

Unconsistent Pandas axis labels

I have a pandas data-frame (df) including a column as labels (column 'Specimens' here).
Specimens Sample Min_read_lg Avr_read_lg Max_read_lg
0 B.pleb_sili 1 32 249.741 488
1 B.pleb_sili 2 30 276.959 489
2 B.conc_sili 3 25 256.294 489
3 B.conc_sili 4 27 277.923 489
4 F1_1_sili 5 34 303.328 489
...
I have tried to plot it as following, but the labels on the x axis are not matching the actual values of the table. Would anyone know why it could be the case?
plot=df.plot.area()
plot.set_xlabel("Specimens")
plot.set_ylabel("Read length")
plot.set_xticklabels(df['Specimens'], rotation=90)
I think the "plot.set_xticklabels" method is not right, but I would like to understand why the labels on the x axis are mismatched, and most of them missing.

Stata : Change name of variables with values of another Variables

I have a dataset of variables looking like this:
Screenshot of the Dataset.
I would like, if it is possible, to label the other variables with the name of the country they are related to. For example, ggdy1 is the gross debt/GDP ratio for country 1, here Austria, while ggdy2 is the Gross Debt/GDP ratio for country 2, Belgium.
To avoid the back and forth from the dataset to the results or command windows, is there a way to label the different variables (ggdy, pby,...) automatically with the name of the suitable country?
I have 28 countries in my dataset and work on Stata 15.
I have to say I think this is the wrong question. Your data structure is analogous to this
* Example generated by -dataex-. For more info, type help dataex
clear
input float year str7 country1 float(y1 x1) str7 country2 float(y2 x2)
1990 "Austria" 12 16 "Belgium" 20 24
1991 "Austria" 14 18 "Belgium" 22 26
end
which is both logical and perverse for most Stata purposes. A simple reshape gets you to a structure that is much more useful for most analyses.
. reshape long country y x , i(year) j(which)
(note: j = 1 2)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 2 -> 4
Number of variables 7 -> 5
j variable (2 values) -> which
xij variables:
country1 country2 -> country
y1 y2 -> y
x1 x2 -> x
-----------------------------------------------------------------------------
. l
+----------------------------------+
| year which country y x |
|----------------------------------|
1. | 1990 1 Austria 12 16 |
2. | 1990 2 Belgium 20 24 |
3. | 1991 1 Austria 14 18 |
4. | 1991 2 Belgium 22 26 |
+----------------------------------+
which does no harm, but is not essential.
P.S. What you ask for is programmable too, something like
foreach v of var ggdy* {
local suffix = substr("`v'", 5, .)
local where = country`suffix'[1]
label var `v' "ggdy `where'"
label var pby`suffix' "pby `where'"
label var cby`suffix' "cby `where'"
label var fby`suffix' "fby `where'"
}

What type of graph can best show the correlation between 'Fare' (price) and "Survival" (Titanic)?

I'm playing around with Seaborn and Matplotlib and I trying to find the best type of graph to show the correlation between fare values and chance of survival from the titanic dataset.
The Titanic fare column has a lot of different values ranging from 1 to 500 and some of the values are repeated often.
Here is a sample of value_counts:
titanic.fare.value_counts()
8.0500 43
13.0000 42
7.8958 38
7.7500 34
26.0000 31
10.5000 24
7.9250 18
7.7750 16
0.0000 15
7.2292 15
26.5500 15
8.6625 13
7.8542 13
7.2500 13
7.2250 12
16.1000 9
9.5000 9
15.5000 8
24.1500 8
14.5000 7
7.0500 7
52.0000 7
31.2750 7
56.4958 7
69.5500 7
14.4542 7
30.0000 6
39.6875 6
46.9000 6
21.0000 6
.....
91.0792 2
106.4250 2
164.8667 2
Survival column on the other hand has only two values :
>>> titanic.survived.head(10)
271 1
597 0
302 0
633 0
277 0
413 0
674 0
263 0
466 0
A histogram would only show the frequency of fares in certain ranges.
For a scatter plot I would need two variables; having "survived" which has only two values would make for a strange variable.
Is there a way to show the rise of survivability as fare increases clearly through a line graph?
I know there is a correlation as If I sort fare values in ascending order (000-500).
Then do:
>>> titanic.head(50).survived.sum()
5
>>>titanic.tail(50).survived.sum()
37
I see a correlation.
Thanks.
This is what I did to show the correlation between the fare values and the chance of survival:
First, I created a new column Fare Groups, converting fare values to groups of fare ranges, using cut().
df['Fare Groups'] = pd.cut(df.Fare, [0,50,100,150,200,550])
Next, I created a pivot_table().
piv_fare = df.pivot_table(index='Fare Groups', columns='Survived', values = 'Fare', aggfunc='count')
Output:
Survived 0 1
Fare Groups
(0, 50] 484 232
(50, 100] 37 70
(100, 150] 5 19
(150, 200] 3 6
(200, 550] 6 14
Plot:
piv_fare.plot(kind='bar')
It seems, those who had the cheapest tickets (0 to 50) had the lowest chance of survival. In fact, (0 to 50) is the only fare range where the chance to die is higher than the chance to survive. Not just higher, but significantly higher.