How to make a stacking bar using ggplot? - ggplot2

I have got this dataset. I am trying to do a stacking bar graph with proportions using ggplot for this data:
I am not really sure how to manipulate it into tables first! I know, I just started learning R, two weeks ago and I'm kind of stuck. I made a similar graph before. I attached it here.

I'm not sure if I got your question right, but I'll try to answer it. I see that this is your first question in Stack Overflow, so I'd advise you to post a minimal reproducible example on your next question.
1) "I am not really sure how to manipulate it into tables first!"
Copy the data into an excel file, save it as csv and import into R with base R command.
df <- read.csv('your_data.csv')
2) " do a stacking bar graph with proportions"
Your problem is very similar to the one mentioned in this question. Make sure to check it out, but I've already adapted the code below, see if it works.
library(ggplot2)
library(dplyr)
library(tidyr)
df <- read.csv('your_data.csv')
# Add an id variable for the filled regions and reshape
dfm <- df %>%
mutate(Domain = factor(row_number()) %>%
gather(variable, value, -Domain)
ggplot(dfm, aes(x = variable, y = value, fill = Domain)) +
geom_bar(position = "fill",stat = "identity") +
# or:
# geom_bar(position = position_fill(), stat = "identity"
scale_y_continuous(labels = scales::percent_format())

Related

Extract ggplot smoothing function and save in dataframe

I am trying to extract my smoothing function from a ggplot and save it as dataframe (hourly datapoints) Plot shown here.
What I have tried:
I have already tried different interpolation techniques, but the results are not satisfying.
Linear interpolation causes a zic-zac pattern.
Na_spline causes a weird curved pattern.
The real data behaves more closely to the geom_smoothing of ggplot. I have tried to reproduce it with the following functions:
loess.data <- stats::loess(Hallwil2018_2019$Avgstemp~as.numeric(Hallwil2018_2019$datetime), span = 0.5)
loess.predict <- predict(loess.data, se = T)
But it creates a list that misses the NA values and is much shorter.
You can pass a newdata argument to predict() to get it to predict a value for every time period you give it. For example (from randomly generated data):
df <- data.frame(date = sample(seq(as.Date('2021/01/01'),
as.Date('2022/01/01'),
by="day"), 40),
var = rnorm(40, 100, 10))
mod <- loess(df$var ~ as.numeric(df$date), span = 0.5)
predict(mod, newdata = seq(as.Date('2021/01/01'), as.Date('2022/01/01'), by="day"))

ggplot geom_bar with errors not printing

Assigning means and se to a data frame and trying to construct geom_bar with error bars for 2 experimental treatments across two genders
library(ggplot2)
dff <- data.frame(group=c('NSI','NSI','SI','SI'),
gender = c('Female','Male','Female','Male'),
mean.Score =c(3.41,3.3,2.63,3.32),
se =c(1.92,2.03,1.73,2.21))
dff$group <- as.factor(dff$group)
dff$gender <- as.factor(dff$gender)
p <- ggplot(dff,aes(x= group,y=mean.Score,fill=gender))+
scale_fill_manual(values = c("#F34444", "#0066CC"))+
geom_bar(position = 'dodge',stat = 'identity',width=1.8)+
geom_errorbar(aes(ymin=mean.Score-se, ymax=mean.Score+se),
width=.2,
position=position_dodge(1.8))+
theme(plot.title = element_text(size = 10,hjust = 0.9))+
scale_x_discrete(limits=c('NSI','NSI','SI','SI'))+
ggtitle("Performance by Treatment & Gender")
plot(p)
Two treatments: NSI and SI. Across two genders: Female and Male; data is corresponding mean performance and standard error of that performance. Assigning to a data frame and trying to plot a histogram with error bars of the data. Code executes fine in the window, but then nothing shows up in the Plot window. Thanks for any help for a relative newbie!
Mary
probably go to tools > global options > pane layout, and make sure the plot is checked for the console window
example of where to check in rstudio

use object of S4 class SeqExpressionSet to plot PCA with ggplot2

I have made an object of S4 class SeqExpressionSet with EDASeq which I can then analyse with plotPCA.
However the plotPCA function lacks the ability to fully adjust aesthetics. I was therefore wondering whether it is possible to change the dataset somehow so I can use it with e.g. ggplot2 or a different package that enables more adjustments.
I'm not familiar with EDAseq, but my best guess is that you'd have to do a PCA manually and plot those results. Assuming your object is called my_object and the source code posted here, you can reconstruct the process as follows:
dat <- normCounts(my_object)
dat <- apply(dat, 1, function(y) scale(y, center = TRUE, scale = FALSE))
s <- svd(dat)
df <- data.frame(
PC1 = s$u[, 1], PC2 = s$u[, 2]
)
ggplot(df, aes(PC1, PC2)) +
geom_point()
Note that I haven't tested this code as I don't have example data and was too lazy to install EDAseq.

matplot pandas plotting multiple y values on the same column

Trying to plot using matplot but lines based on the value of a non x , y column.
For example this is my DF:
code reqs value
AGB 253319 57010.16528
ABC 242292 35660.58176
DCC 240440 36587.45336
CHB 172441 57825.83052
DEF 148357 34129.71166
Which yields this plot df.plot(x='reqs',y='value',figsize=(8,4)) :
What I'm looking to do is have a plot with multiple lines one line for each of the codes. Right now its just doing 1 line and ignoring the code column.
I tried searching for an answer but each one is asking for multiple y's I dont have multiple y's I have the same y but with different focuses
(surely i'm using the wrong terms to describe what I'm trying to do hopefully this example and image makes sense)
The result should look something like this:
So I worked out how to do exactly ^ if anyone is curious:
plt_df = df
fig, ax = plt.subplots()
for key,grp in plt_df.groupby(['code']):
ax = grp.plot(ax=ax, kind ='line',x='reqs',y='value',label=key,figsize=(20,4),title = "someTitle")
plt.show()

Convert date/time index of external dataset so that pandas would plot clearly

When you already have time series data set but use internal dtype to index with date/time, you seem to be able to plot the index cleanly as here.
But when I already have data files with columns of date&time in its own format, such as [2009-01-01T00:00], is there a way to have this converted into the object that the plot can read? Currently my plot looks like the following.
Code:
dir = sorted(glob.glob("bsrn_txt_0100/*.txt"))
gen_raw = (pd.read_csv(file, sep='\t', encoding = "utf-8") for file in dir)
gen = pd.concat(gen_raw, ignore_index=True)
gen.drop(gen.columns[[1,2]], axis=1, inplace=True)
#gen['Date/Time'] = gen['Date/Time'][11:] -> cause error, didnt work
filter = gen[gen['Date/Time'].str.endswith('00') | gen['Date/Time'].str.endswith('30')]
filter['rad_tot'] = filter['Direct radiation [W/m**2]'] + filter['Diffuse radiation [W/m**2]']
lis = np.arange(35040) #used the number of rows, checked by printing. THis is for 2009-2010.
plt.xticks(lis, filter['Date/Time'])
plt.plot(lis, filter['rad_tot'], '.')
plt.title('test of generation 2009')
plt.xlabel('Date/Time')
plt.ylabel('radiation total [W/m**2]')
plt.show()
My other approach in mind was to use plotly. Yet again, its main purpose seems to feed in data on the internet. It would be best if I am familiar with all the modules and try for myself, but I am learning as I go to use pandas and matplotlib.
So I would like to ask whether there are anyone who experienced similar issues as I.
I think you need set labels to not visible by loop:
ax = df.plot(...)
spacing = 10
visible = ax.xaxis.get_ticklabels()[::spacing]
for label in ax.xaxis.get_ticklabels():
if label not in visible:
label.set_visible(False)