Installing Packages and Bioconductor Overview

Overview

Teaching: 20 min
Exercises: 5 min
Questions
  • What is CRAN?

  • How do I install packages?

  • What is Bioconductor?

  • How do I install Bioconductor packages?

  • How do I find help in Bioconductor?

Objectives
  • Manage packages

  • Manage Bioconductor Packages

  • Navigate Bioconductor

  • Gain access to Bioconductor

  • Learn about packages to access data

  • Other output formats

R Packages and CRAN

It is possible to add functions to R by writing a package, or by obtaining a package written by someone else. One of the primary ways in which packages are distributed is through centralized repositories. The first R repository a user typically runs into is the Comprehensive R Archive Network (CRAN), As of this writing, there are over 17,000 packages available on CRAN, the home of many of the most popular R packages. R and RStudio provide functionality for managing packages:

From the console:

Using the RStudio interface

You can also use the RStudio interface to view and install packages. Pane 4 (which might be different for you if you’ve personalized RStudio’s interface) provides a Packages tab that allows you to see the packages you have installed and loaded at a given point in time. The Packages tab is broken down into your User Library, these are the packages you have installed throughout your R use experience, and also the System Library the packages that are part of the R kernel which is updated when you update your version of R.

The default packages interface for RStudio:

plot of chunk unnamed-chunk-1

To install a package click Install

plot of chunk unnamed-chunk-2

In the pop-up window type the package name of interest.

plot of chunk unnamed-chunk-3

Loading Packages

When it comes time to load a package you have installed it can also be done a number of ways most commonly it will be done using the console or writing it at the beginning of a script since the first thing you should be doing in your script is loading libraries.

From the console:

library(<package>)

Using the RStudio interface: plot of chunk unnamed-chunk-4 Click the white box next to a package in your library and that will load the library in to your session.

Tip: Beware of loading conflicts

Pay attention to the messages that are being printed to the console when you load a package. Sometimes you will see The following objects are masked from.... This is telling you that when the package is loaded the function that typically is related to another package is now being referenced by the package you loaded. Sometimes this can cause a conflict between other packages which depend on the native function being masked and cause your code to break. Beware!

Exercise: Install and Load a package from CRAN

Using both the console and the RStudio interface install and load the ggplot2 package with the console and dplyr package with the RStudio interface.

Solution

From the console:

install.packages("ggplot2") #installs ggplot2
library(ggplot2) #loads the package

Using the RStudio interface:

Using the Packages tab in Pane 4 click on Install button and type in dplyr. Next proceed to click the Check Box next to dplyr to load it.

About Biocondutor

Similar to CRAN, Bioconductor is a repository of R packages as well. Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development. It has two releases each year, and an active user community.

Installing Bioconductor

In order to install Bioconductor packages you first need the BiocManager package which is hosted on CRAN. To install it you will need to run:

install.packages("BiocManager")

Bioconductor releases and current version

The Bioconductor project produces two releases each year, one around April and another one around October.

The April release of Bioconductor coincides with the annual release of R. Packages in that Bioconductor release are tested for the upcoming version of R. Users must install the new version of R to access the new version of those packages.

The October release of Bioconductor continue to work with the same version of R for that annual cycle.

Each time a new release is made, the minor version of all the packages in the Bioconductor repository is incremented by one.

Once the BiocManager package is installed, the BiocManager::version() function displays the version (i.e., release) of the Bioconductor project that is currently active in the R session.

BiocManager::version()
[1] '3.12'

Installing Bioconductor packages

The BiocManager::install() function is used to install packages. The function first searches for the requested package(s) on the Bioconductor repository, but falls back on the CRAN repository and also supports installation from GitHub repositories. There is a lengthy explanation by Bioconductor maintainers as to why using this is prefered over install.packages(). Find that here

We can install the BiocPkgTools package which provides a collection of simple tools for learning about Bioc Packages we’d install it as so:

BiocManager::install("BiocPkgTools")

Explore the package universe

library(BiocPkgTools) #loads BiocPkgTools 
biocExplore() #interactive  visualization of package info.

Check for updates

The BiocManager::valid() function checks the version of currently installed packages, and checks whether a new version is available for any of them on the Bioconductor repository.

Conveniently, if any package can be updated, the function generates and displays the command needed to update those packages. Users simply need to copy-paste and run that command in their R console.

If everything is up-to-date, the function will simply print TRUE.

BiocManager::valid()

Bioconductor Help

Bioconductor stands apart from CRAN in that it requires packages to have documentation available and workflows. Bioconductor also provides a Bioconductor specific support site that is a Stack Overflow type experience. The site is possible due to the contribution of developers in the Bioconductor community as well as countless dedicated volunteers answering questions.

Bioconductor Packages

Each package hosted on Bioconductor has a dedicated page with various resources. For an example, looking at the scater package page on Bioconductor, we see that it contains:

You can gain the most traction using a package by looking at its documentation section.

Below this, the Details section covers finer nuances of the package, mostly relating to its relationship to other packages:

For example, we can see that an entry called simpleSingle in the Suggests Me field on the scater page takes us to a step-by-step workflow for low-level analysis of single-cell RNA-seq data.

BiocViews

One additional Details entry, the biocViews, is helpful for looking at how the authors annotate their package. For example, for the scater package, we see that it is associated with DataImport, DimensionReduction, GeneExpression, RNASeq, and SingleCell, to name but some of its many annotations.

The BiocViews page provides a hierarchically organized view of annotations associated with Bioconductor packages. Under the “Software” label for example (which is comprised of most of the Bioconductor packages), there exist many different views to explore packages. For example, we can inspect based on the associated “Technology”, and explore “Sequencing” associated packages, and furthermore subset based on “RNASeq”.

Another area of particular interest is the “Workflow” view, which provides Bioconductor packages that illustrate an analytical workflow. For example, the “SingleCellWorkflow” contains the aforementioned tutorial, encapsulated in the simpleSingleCell package.

Accessing Publicly Available Data

The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.

Getting data from GEO is really quite easy. There is only one command that is needed, getGEO. This one function interprets its input to determine how to get the data from GEO and then parse the data into useful R data structures. Usage is quite simple. After installing GEOquery BiocManager::install("GEOquery"), loads the GEOquery library:

library(GEOquery)

Now, you are free to access any GEO accession. In general, you will use only the GEO accession.

gds <- getGEO("GDS507") #provide a GEO Accession ID and assign it to an object.

To learn more take advantage of the GEOquery vignette:

browseVignettes(package = "GEOquery")

Key Points

  • Both CRAN and Bioconductor provide a pantheon of packages to extend R.

  • Use install.packages() to install packages (libraries).

  • BiocManager::install() is the recommended way to install Bioconductor packages.