use object of S4 class SeqExpressionSet to plot PCA with ggplot2 - ggplot2

I have made an object of S4 class SeqExpressionSet with EDASeq which I can then analyse with plotPCA.
However the plotPCA function lacks the ability to fully adjust aesthetics. I was therefore wondering whether it is possible to change the dataset somehow so I can use it with e.g. ggplot2 or a different package that enables more adjustments.

I'm not familiar with EDAseq, but my best guess is that you'd have to do a PCA manually and plot those results. Assuming your object is called my_object and the source code posted here, you can reconstruct the process as follows:
dat <- normCounts(my_object)
dat <- apply(dat, 1, function(y) scale(y, center = TRUE, scale = FALSE))
s <- svd(dat)
df <- data.frame(
PC1 = s$u[, 1], PC2 = s$u[, 2]
)
ggplot(df, aes(PC1, PC2)) +
geom_point()
Note that I haven't tested this code as I don't have example data and was too lazy to install EDAseq.

Related

Extract ggplot smoothing function and save in dataframe

I am trying to extract my smoothing function from a ggplot and save it as dataframe (hourly datapoints) Plot shown here.
What I have tried:
I have already tried different interpolation techniques, but the results are not satisfying.
Linear interpolation causes a zic-zac pattern.
Na_spline causes a weird curved pattern.
The real data behaves more closely to the geom_smoothing of ggplot. I have tried to reproduce it with the following functions:
loess.data <- stats::loess(Hallwil2018_2019$Avgstemp~as.numeric(Hallwil2018_2019$datetime), span = 0.5)
loess.predict <- predict(loess.data, se = T)
But it creates a list that misses the NA values and is much shorter.
You can pass a newdata argument to predict() to get it to predict a value for every time period you give it. For example (from randomly generated data):
df <- data.frame(date = sample(seq(as.Date('2021/01/01'),
as.Date('2022/01/01'),
by="day"), 40),
var = rnorm(40, 100, 10))
mod <- loess(df$var ~ as.numeric(df$date), span = 0.5)
predict(mod, newdata = seq(as.Date('2021/01/01'), as.Date('2022/01/01'), by="day"))

Error in UseMethod("filter") : no applicable method for 'filter' applied to an object of class "NULL"

I am actually using Tidymodels package on R to study a multi-class classification problem. I have trained several models using Workflow sets, and in my recipe I added a step taken there to replace NA values with a constant. The models that I included in the workflow are:
mlp <-
mlp(hidden_units = tune(), penalty = tune(), epochs = tune()) %>%
set_engine('nnet') %>%
set_mode('classification')
multinom <-
multinom_reg(penalty = tune(), mixture = tune()) %>%
set_engine('glmnet')
rand_forest <-
rand_forest(mtry = tune(), min_n = tune()) %>%
set_engine('ranger') %>%
set_mode('classification')
tabnet <- tabnet(mode="classification", batch_size= 126, virtual_batch_size= 128, epochs= 1,
num_steps = tune(), learn_rate = tune())%>%
set_engine("torch", verbose = TRUE)
For some models I tried a recipe with SMOTE ("themis" package), PCA, and normalisation (all in the same workflow by adding the steps to the original recipe). Training and testing went pretty well, so I tried an ensemble of these models (using the package "stacks"):
tidymodels_prefer()
stack1 <-
stacks() %>%
add_candidates(res_1)
set.seed(2002)
res1_stack <-
stack1 %>%
blend_predictions()
ens <- fit_members(res1_stack)
When I run this last operation (fit_members) I receive this error
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "NULL"
I figured out, reading this and this on GitHub, that it was because the added step "constantimpute" to the recipe. However, I don't exactly know how can I fix it. Someone can help me?
Thank you very much!!!
Before using the filter function, make sure the table you want to filter is loaded.
Most times we have the the view() function applied and this prevents the table from being loaded into memory for usage.

How to make a stacking bar using ggplot?

I have got this dataset. I am trying to do a stacking bar graph with proportions using ggplot for this data:
I am not really sure how to manipulate it into tables first! I know, I just started learning R, two weeks ago and I'm kind of stuck. I made a similar graph before. I attached it here.
I'm not sure if I got your question right, but I'll try to answer it. I see that this is your first question in Stack Overflow, so I'd advise you to post a minimal reproducible example on your next question.
1) "I am not really sure how to manipulate it into tables first!"
Copy the data into an excel file, save it as csv and import into R with base R command.
df <- read.csv('your_data.csv')
2) " do a stacking bar graph with proportions"
Your problem is very similar to the one mentioned in this question. Make sure to check it out, but I've already adapted the code below, see if it works.
library(ggplot2)
library(dplyr)
library(tidyr)
df <- read.csv('your_data.csv')
# Add an id variable for the filled regions and reshape
dfm <- df %>%
mutate(Domain = factor(row_number()) %>%
gather(variable, value, -Domain)
ggplot(dfm, aes(x = variable, y = value, fill = Domain)) +
geom_bar(position = "fill",stat = "identity") +
# or:
# geom_bar(position = position_fill(), stat = "identity"
scale_y_continuous(labels = scales::percent_format())

Convert date/time index of external dataset so that pandas would plot clearly

When you already have time series data set but use internal dtype to index with date/time, you seem to be able to plot the index cleanly as here.
But when I already have data files with columns of date&time in its own format, such as [2009-01-01T00:00], is there a way to have this converted into the object that the plot can read? Currently my plot looks like the following.
Code:
dir = sorted(glob.glob("bsrn_txt_0100/*.txt"))
gen_raw = (pd.read_csv(file, sep='\t', encoding = "utf-8") for file in dir)
gen = pd.concat(gen_raw, ignore_index=True)
gen.drop(gen.columns[[1,2]], axis=1, inplace=True)
#gen['Date/Time'] = gen['Date/Time'][11:] -> cause error, didnt work
filter = gen[gen['Date/Time'].str.endswith('00') | gen['Date/Time'].str.endswith('30')]
filter['rad_tot'] = filter['Direct radiation [W/m**2]'] + filter['Diffuse radiation [W/m**2]']
lis = np.arange(35040) #used the number of rows, checked by printing. THis is for 2009-2010.
plt.xticks(lis, filter['Date/Time'])
plt.plot(lis, filter['rad_tot'], '.')
plt.title('test of generation 2009')
plt.xlabel('Date/Time')
plt.ylabel('radiation total [W/m**2]')
plt.show()
My other approach in mind was to use plotly. Yet again, its main purpose seems to feed in data on the internet. It would be best if I am familiar with all the modules and try for myself, but I am learning as I go to use pandas and matplotlib.
So I would like to ask whether there are anyone who experienced similar issues as I.
I think you need set labels to not visible by loop:
ax = df.plot(...)
spacing = 10
visible = ax.xaxis.get_ticklabels()[::spacing]
for label in ax.xaxis.get_ticklabels():
if label not in visible:
label.set_visible(False)

Plotting Natural Earth features on a custom projection

I am trying to make some plots of sea ice data. The data is delivered in the EASE-North grid, an example file (HDF4) can be downloaded at:
ftp://n4ftl01u.ecs.nasa.gov/SAN/OTHR/NISE.004/2013.09.30/
I created a custom projection class for the EASE-Grid, it seems to be working (the coastlines align well with the data).
When i try to add a Natural Earth feature, it returns an empty Matplotlib figure.
import gdal
import cartopy
# projection class
class EASE_North(cartopy.crs.Projection):
def __init__(self):
# see: http://www.spatialreference.org/ref/epsg/3408/
proj4_params = {'proj': 'laea',
'lat_0': 90.,
'lon_0': 0,
'x_0': 0,
'y_0': 0,
'a': 6371228,
'b': 6371228,
'units': 'm',
'no_defs': ''}
super(EASE_North, self).__init__(proj4_params)
#property
def boundary(self):
coords = ((self.x_limits[0], self.y_limits[0]),(self.x_limits[1], self.y_limits[0]),
(self.x_limits[1], self.y_limits[1]),(self.x_limits[0], self.y_limits[1]),
(self.x_limits[0], self.y_limits[0]))
return cartopy.crs.sgeom.Polygon(coords).exterior
#property
def threshold(self):
return 1e5
#property
def x_limits(self):
return (-9000000, 9000000)
#property
def y_limits(self):
return (-9000000, 9000000)
# read the data
ds = gdal.Open('D:/NISE_SSMISF17_20130930.HDFEOS')
# this loads the layers for both hemispheres
data = np.array([gdal.Open(name, gdal.GA_ReadOnly).ReadAsArray()
for name, descr in ds.GetSubDatasets() if 'Extent' in name])
ds = None
# mask anything other then sea ice
sea_ice_concentration = np.ma.masked_where((data < 1) | (data > 100), data, 0)
# plot
lim = 3000000
fig, ax = plt.subplots(figsize=(8,8),subplot_kw={'projection': EASE_North(), 'xlim': [-lim,lim], 'ylim': [-lim,lim]})
land = cartopy.feature.NaturalEarthFeature(
category='physical',
name='land',
scale='50m',
facecolor='#dddddd',
edgecolor='none')
#ax.add_feature(land)
ax.coastlines()
# from the metadata in the HDF
extent = [-9036842.762500, 9036842.762500, -9036842.762500, 9036842.762500]
ax.imshow(sea_ice_concentration[0,:,:], cmap=plt.cm.Blues, vmin=1,vmax=100,
interpolation='none', origin='upper', extent=extent, transform=EASE_North())
The script above works fine and produces this result:
But when i uncomment the ax.add_feature(land) it fails without any error, only returning the empty figure. Am i missing something obvious?
Here is the IPython Notebook:
http://nbviewer.ipython.org/6779935
My Cartopy build is version 0.9 from Christoph Gohlke's website (thanks!).
edit:
Trying to save the figure does throw an exception:
fig.savefig(r'D:\test.png')
C:\Python27\Lib\site-packages\shapely\speedups\_speedups.pyd in shapely.speedups._speedups.geos_linearring_from_py (shapely/speedups/_speedups.c:2270)()
ValueError: A LinearRing must have at least 3 coordinate tuples
Examining the 'land' cartopy.feature reveals no issues, all polygons pass the .isvalid() and all rings (ext en int) are of 4 or more tuples. So the input shape doesnt seem to be the problem (and works fine in PlateCaree()).
Maybe some rings (like on the southern hemisphere) get 'corrupt' after transforming to EASE_North?
edit2:
When i remove the build-in NE features and load the same shapefile (but with anything below 40N clipped) it works. So it seems like some sort of reprojection issue.
for state in shpreader.Reader(r'D:\ne_50m_land_clipped.shp').geometries():
ax.add_geometries([state], cartopy.crs.PlateCarree(),facecolor='#cccccc', edgecolor='#cccccc')
I'd have said that this was a bug. I'm guessing add_feature updates the matplotlib viewLim and the result is that the picture zooms in to a tiny area (which appears white unless you zoom out a lot).
From the top of my head, I think the underlying behaviour has been improved in matplotlib, but cartopy is not yet making use of the new viewLim calculation. In the meantime I'd suggest setting the extents of your map manually with:
ax.set_extent(extent, transform=EASE_North())
HTH