Fork me on GitHub

NGPhylogeny.fr

Documentation

Overview

NGPhylogeny.fr is a webservice dedicated to phylogenetic analysis. It provides a complete set of phylogenetic tools and workflows adapted to various contexts and various levels of user expertise. It is built around the main steps of most phylogenetic analyses:

Workflow

NGPhylogeny.fr integrates several tools for each steps of the workflow:

Workflow

Different ways of using NGPhylogeny.fr are offered, depending on the user needs or expertise:

  1. Oneclick workflows are already preconfigured with default options that should work on the majority of usecases. The only required input is the sequence data file in Fasta format. Input data type (dna or protein) is detected automatically;
  2. Advanced workflows have basically the same structure as oneclick workflows, but can be parametrized. It means that the user should customize the options of each step of the workflows: alignment, curation, tree inference.
  3. Workflow maker allows the user to choose the combination of tools that suits best his/her needs, and to customize the parameters.
  4. Individual tools may be run if specific taks are required.

Moreover, NGPhylogeny.fr provides a user-friendly visualization layer specific to the different kinds of data usually manipulated in phylogenetics (i.e. alignments, trees).

Finally, Blast-Search module is placed upstream phylogenetic workflows and aims at searching for sequences that are similar to a given user input sequence. Blast-Search then analyses Blast results and builds a quick (and inaccurate) tree in which users can remove unwanted sequences. Remaining sequences may then be used as input of any ngphylogeny.Fr workflows.

Branch supports

In addition to their respective bootstraps, almost all tree inference tools are proposed with the following branch support computations:

  1. Felsenstein Bootstrap Proportions (FBP);
  2. Transfer Bootstrap Expectation (TBE).

For example, it is possible to compute FBP and TBE supports with FastTree.

Bootstrap options are accessible via "Advanced workflows" and "Workflow maker".

Computations

NGPhylogeny.fr works together with Institut Pasteur Galaxy instance to:

  1. Manage tools and workflows;
  2. Run tools and workflows on the underlying computing cluster;
  3. Keep track of run histories.

Workflow

Oneclick workflows

One click workflows are accessible via the "Phylogeny Analysis/One click workflow" link on the tool bar:

One click workflows

The 4 oneclick workflows implemented in NGPhylogeny.fr differ by the tree inference tool:

  1. PhyML+SMS: This workflow uses PhyML+SMS to select the best evolutionary model and to infer the trees. However, it may not handle very large datasets, as the tree inference may take a very long time. SH-like aLRT branch supports are computed by default;
  2. PhyML: This workflow uses PhyML to infer trees. Default options depend on data type (dna, protein). Like PhyML+SMS, large datasets may not be analyzed with this workflow;
  3. FastME: This workflow infer trees using FastME. FastME provides distance algorithms to infer phylogenies and can work with large datasets;
  4. FastTree: This workflow runs FastTree to infer trees. "FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences" and works for very large datasets. Local branch supports (SH) are computed by default by FastTree.

Sections below describe these oneclick workflows and all the steps.

PhyML+SMS

PhyML+SMS workflow

Workflow outputs:

PhyML

One click workflows

Workflow outputs:

FastME

One click workflows

Workflow outputs:

FastTree

One click workflows

Workflow outputs:

Default options

MAFFT

BMGE

PhyML+SMS

PhyML

FastME

FastTree

Advanced workflows

Advanced workflows are accessible via the "Phylogeny Analysis/Advanced workflows" link on the tool bar:

Advanced workflows

Advanced workflows have the same structure as oneclick workflows, but some options can be customized.

In particular, it is possible to perform specific bootstrap analyses:

Workflow Maker

The workflow maker is accessible via the "Phylogeny Analysis/Workflow maker" link on the tool bar:

Workflow Maker

In this mode, users can choose the tool they want to run at each step of the workflow:

  1. Multiple alignment: Clustal omega; MAFFT; MUSCLE;
  2. Alignment curation: BMGE; Gblocks; Noisy;
  3. Tree inference: FastME; FastTree; MrBayes; PhyML+SMS; PhyML; TNT;
  4. Tree rendering: Newick Display.

In addition, parameters of each step should be specified.

Launch individual tools

Tools that make up the workflows can be configured and run independently. They are available via the "tools" link on the toolbar:

Workflow Maker

Blast-Search

Blast-Search is accessible via the "Phylogeny Analysis/Blast" link on the tool bar:

Workflow Maker

User just have to paste an input sequence, and specify a few options:

  1. Program: blastp, blastn, blastx, tblastn;
  2. Database: refseq, swissprot, nr, etc.;
  3. Max number of results: Only the specified number of closest sequences will be taken into account (based on bitscore);
  4. E.Value threshold;
  5. Query coverage threshold: percentage of aligned sequence compared to query length;
  6. Contact Email: Optional parameter, to be notified when analysis is over.

A Blast run will be launched on the Galaxy instance. The given number of best matches will be treated, and a neighbor joining tree will be computed from a distance matrix (Kimura distance for DNA sequences and Jukes Cantor distance for Protein sequences).

Once the neighbor joining tree is computed, it is displayed in a dynamic visualizer in which clades to remove can be selected individually. Clicking the "Delete selected sequences" button with then remove the sequences from the dataset.

This sequences can be used as input of any phylogenetic workflows by selecting the given blast analysis in the input panel, and then submiting the job:

Blast Run as Input

Typical analysis

Once the phylogenetic workflow is configured and launched, user is redirected to a waiting page giving informations about the run:

Workflow Maker

All workflows start by uploading input data to Institut Pasteur Galaxy server. Each step of the workflow is then put in pending status, waiting for available resources on the Galaxy server.

Workflow Maker

Once a step executed, corresponding result files are downloadable or viewable depending on the format. Images, trees, and alignments are viewable through specific viewers. In addition, trees may be uploaded to iTOL for further investigations.

Run NGPhylogeny.fr locally (Docker)

It is possible to run NGPhylogeny.fr via two docker images:

  1. NGphylgeny.fr docker image running a python/django web server;
  2. Galaxy docker image with all tools and workflows already installed (see this github repo).

docker run --privileged=true -p 8080:80 -p 8121:21 -p 8122:22 evolbioinfo/ngphylogeny-galaxy

docker run -p 8000:8000 evolbioinfo/ngphylogeny admin admin@admin http://host.docker.internal:8080 admin

docker run -p 8000:8000 --net=host evolbioinfo/ngphylogeny admin admin@admin http://localhost:8080 admin

Dataset size limitation

NGPhylogeny can handle large datasets. However, on the public server, we implemented limitations on the number and length of sequences that depend on the running tool. An error is displayed if the dataset is too large to be analyzed with the public instance of NGPhylogeny.fr.

In contrast, these limitations are not activated if you run NGPhylogeny.fr locally using Docker.

Example video