Transcriptomics and RNAseq data analyses

Typical alpine (left) and mountain (right) plants of Heliosperma pusillum.

This hands-on course aims to introduce RNAseq, an NGS-based method to investigate gene expression and make functional interpretations. We will exemplify the method by using a system of mountain and alpine ecotypes of Heliosperma pusillum in the Alps (data from Szukala et al. 2023). These are ecologically and morphologically distinct units, but previous results suggested that they have locally diverged from one another multiple times independently.

We use here a subset of the data for 3 individuals of each ecotype, from 3 different regions. Their seeds have been germinated and the plants have been grown in a common garden before RNA has been fixed, isolated and sequenced. The samples have been sequenced with 100 or 125 bp single-end reads on Illumina HiSeq.

For each analytical step there may be several pipelines possible. Although the reads, genes and annotations represent genuine data, they are artificially selected and organized, only to exemplify the bioinformatic approaches.

More details about our running project and the literature and a press release relevant for it are available here.

We aim here to:

exemplify standard analytical approaches for RNASeq;
find genes that are differentially expressed between the two ecotypes;
estimate the overlap in differentially expressed genes between the ecotypes in different regions (i.e., different origins);
discuss how independent the origins are and which ecotype is ancestral.

Typical habitats of the alpine (left: open, humid, above the timberline) and mountain (right: shaded, dry, below the timberline) ecotypes of Heliosperma pusillum

Setting up your space

Open a Terminal (on Mac from the Applications menu in the Utilities folder, on Linux with Ctrl/Strg+Alt+t or click the terminal icon). Log in into the cluster according to the instructions provided with ssh

If you need to download a result file or pdf to your local computer you can simply use scp

Let’s set up our workspace. First navigate to directory students:
cd ~/students

Then make a directory for your use (replace your respective name) and copy the input files:
mkdir YourName
cd YourName
cp ~/data/inc_small* ./
cp ~/data/145counts.txt ./
cp -r ~/data/Mapping/ ./

Let’s check what is in our folders and make directories for our results:
ls
mkdir ./Trimmed ./TrinityOut ./Counts

Let’s first have a look at the read files. Why are there two files? How long are the reads? What is the name of the last read? How many reads are there in each file?
(Hint: use head, tail and wc -l or grep -c)