Difference between revisions of "Bioinformatics"

From Wiki
Jump to navigation Jump to search
(initial page)
 
Line 9: Line 9:
 
to a reference genome, calculate estimates of gene and transcript abundance, and even do some differential expression calculations (e.g., which genes are significantly up or down regulated) - although DEBrowser can do this in more detail.
 
to a reference genome, calculate estimates of gene and transcript abundance, and even do some differential expression calculations (e.g., which genes are significantly up or down regulated) - although DEBrowser can do this in more detail.
  
First you need to have an account on Dolphin and an account on the Green High Performance Compute Cluster ([http://wiki.umassrc.org/wiki/index.php/Main_Page GHPCC]) - because that is where Dolphin does all its analysis.  
+
First you need to have an account on Dolphin and an account (email hpcc-support@umassmed.edu) on the Green High Performance Compute Cluster ([http://wiki.umassrc.org/wiki/index.php/Main_Page GHPCC]) - because that is where Dolphin does all its analysis.
 +
See [http://dolphin.readthedocs.io/en/master/dolphin-ui/quickstart.html the Dolphin Quickstart Guide] or [http://wiki.umassrc.org/wiki/index.php/Requesting_Access the ghpcc wiki] on how to do that. note: once you have a ghpcc account you can login directly to the ghpcc if you want to manage your directories and file structure (see [http://wiki.umassrc.org/wiki/index.php/Connecting_to_the_Cluster connecting to the cluster]).
  
First you need to get your fastq files some place where Dolphin can work with them, that means onto the GHPCC.  That means you need an
 
account on the ghpcc, email ... to get one.  This link [] talks more about how the space on the GHPCC is organized.  You also need an account on Dolphin, email ... for that.
 
 
You have several ways to get files to the GHPCC. You can just use your favorite file transfer program, eg:
 
You have several ways to get files to the GHPCC. You can just use your favorite file transfer program, eg:
from whichever computer has the files initially: <code> scp myfile.fastq ll36w@ghpcc06.umassrc.org:/project/umw_lawrence_lifshitz/data/myfile.fastq </code> .  Dolphin may also be able to import them from your PC via an excel spreadsheet...
+
from whichever computer has the files initially: <code> scp myfile.fastq ll36w@ghpcc06.umassrc.org:/project/umw_lawrence_lifshitz/data/myfile.fastq </code> .  See also [http://wiki.umassrc.org/wiki/index.php/Transferring_Data ghpcc wiki on transferring data]. Dolphin may also be able to import them from your PC via an excel spreadsheet...
  
 
Then you need to ''import'' the fastq files to Dolphin.  When Dolphin does this it will copy them to an output ("Process") directory and do some simple checking of their format.  Start up Dolphin (http://dolphin.umassmed.edu).  There are two ways to ''import'' files.  Along the left click
 
Then you need to ''import'' the fastq files to Dolphin.  When Dolphin does this it will copy them to an output ("Process") directory and do some simple checking of their format.  Start up Dolphin (http://dolphin.umassmed.edu).  There are two ways to ''import'' files.  Along the left click

Revision as of 18:59, 12 May 2017


note: this page does not attempt to be comprehensive (e.g., if I say "Dolphin analyzes your RNASeq data", I do not mean to imply that it won't analyze any other type of data). There are many documents with much more detail. It is just my attempt to summarize various concepts or issues that I've found confusing.

Dolphin

Dolphin will take your RNASeq fastq files (typically ending in .fastq or .fastqz), do some quality checking on them, align the fragments to a reference genome, calculate estimates of gene and transcript abundance, and even do some differential expression calculations (e.g., which genes are significantly up or down regulated) - although DEBrowser can do this in more detail.

First you need to have an account on Dolphin and an account (email hpcc-support@umassmed.edu) on the Green High Performance Compute Cluster (GHPCC) - because that is where Dolphin does all its analysis. See the Dolphin Quickstart Guide or the ghpcc wiki on how to do that. note: once you have a ghpcc account you can login directly to the ghpcc if you want to manage your directories and file structure (see connecting to the cluster).

You have several ways to get files to the GHPCC. You can just use your favorite file transfer program, eg: from whichever computer has the files initially: scp myfile.fastq ll36w@ghpcc06.umassrc.org:/project/umw_lawrence_lifshitz/data/myfile.fastq . See also ghpcc wiki on transferring data. Dolphin may also be able to import them from your PC via an excel spreadsheet...

Then you need to import the fastq files to Dolphin. When Dolphin does this it will copy them to an output ("Process") directory and do some simple checking of their format. Start up Dolphin (http://dolphin.umassmed.edu). There are two ways to import files. Along the left click on either NGS tracking -> Excel Import, or NGS tracking -> fastlane. Dolphin considers this import a type of "run". See [http: ] for more details.

Now you want to align and count your data. Go to NGS tracking -> NGS Browser and pick the files ("samples") that you want to analyze (see [http: ] for more details). Then click Send to Pipeline. In addition to checking Yes for FastQC (see []), click on Add Additional Pipeline, and add RSEM (and click on RSEM QC with that option to get some quality control information out about the run). Then click on Submit Pipeline - nothing may happen for about a minute (so don't keep clicking) and then you will be told you job is submitted. See [] for how to check on job ("run") status. It may take hours to days for your job to run, depending upon its size.

Finally (well, not really), once the job has successfully run, go back to NGS tracking -> NGS Browser (or NGS tracking -> Report Status) to find the files you want to do a differential expression analysis on. Each file ("sample") keeps track of what "runs" have been performed on it (i.e., whether RSEM has been run so that a gene counts file is available). Select the files and either the expected_genes.tsv or expected_transcripts.tsv output files to analyze (see [] for how to do this). Then go to Generate Tables, generate at table with all this data and "Save" it within Dolphin (for future reference). Also save it to a local file on your PC (for future import into DEBrowser). Note, you CAN also send this table directly to DEBrowser, which is fine as long as you don't need to specify batch effects.

DEBrowswer

Type http://debrowser.umassmed.edu (or this will pop up automatically if you told Dolphin to send the table directly to DEBrowser). Now see DEBrowser help for how to use it.

resources

http://wiki.umassrc.org/wiki/index.php/Main_Page

https://github.com/Bioconductor-mirror/debrowser

http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

RSEM

http://dolphin.readthedocs.io/en/master/dolphin-ui/Dolphin-UI.html

http://bioinfo.umassmed.edu/

https://galaxy.umassmed.edu/

biocore@umassmed.edu