FAQs

Does Ensembl have promoters or regulatory regions?

Annotated promoters are not available for most genes in Ensembl. However, for mouse and human you can find related information from the Ensembl Regulatory Build. Data from genome-wide studies are used to generate Regulatory Features, which amongst other classifications, can be annotated as 'Promoter Associated'. Regulatory features are accessible as follows:

1. Turn on one of the Regulatory Features tracks in the Region in Detail view. See the 'Functional genomics' menu in the configuration panel. Clicking on a regulatory feature in this track will show a stable ID with a link to the Regulation view. This will reveal more of the data used to build the given Regulatory Feature, in different cell types.

2. The Gene tab also has a Regulation view (available in the left hand menu). This page also displays regulatory motifs where available, for mouse, human and fly (e.g.cisRED, miRanda miRNA targets, VISTA enhancers, and REDfly).

Other access to these data uses the Perl API to query the functional genomics database. Alternatively, use BioMart.

How do I convert IDs? I have ENSG... IDs and I would like HGNC symbols and EntrezGene IDs along with matching Affymetrix platform HC G110 probes.

This can be done using BioMart. We outline the protocol using Ensembl genes ENSG00000162367 and ENSG00000187048. We will enter in the list of genes and export IDs from multiple databases.

Database: Ensembl genes Dataset: Homo sapiens genes Filters: GENE: ID list limit box: select as the header Ensembl Gene ID(s) and enter gene names.

Attributes: EXTERNAL:External References, select HGNC symbol and EntrezGene ID. Scroll down to EXTERNAL: Microarray Attributes to select Affy HC G110.

Click Results at the top.

For BioMart tutorials, see our video or BioMart FAQs.

I think my gene is wrongly annotated, or missing transcripts.

Ensembl determines genes using automatic annotation, involving both computer and biological expertise to determine an entire gene set. This is the Ensembl genebuild. Initial alignment of proteins/mRNAs lead to our transcript set, so all genes in Ensembl link back to protein/mRNA evidence, termed the supporting evidence. For an example see this page. Transcript information must be present in public biological databases such as EMBL-Bank, UniProt and NCBI RefSeq in order to be used to determine Ensembl genes. Click on External References from a gene page or General identifiers from a transcript page to see matching sequences across databases. Consider submitting any sequences to EMBL-Bank. For more transcripts, turn on Vega/Havana genes in the Region in Detail page. Please report any confusing gene annotation to our helpdesk.

How can I export data?

Export individual sequences or alignments using the Export link at the left of the gene, transcript, or location page. Alternatively, export in batch using BioMart. Perl programmers can use our API to access all Ensembl data. See here for more.

How do I view clone sets, such as 129/AB2.2 BACs?

About clones

Turn on clone tracks using the configure this page menu at the left of the region in detail or region overview pages under the location tab.

Clones are found in these menus: Misc regions, External data and Other DNA alignments. Select one or more sets of clones, then click SAVE and close. Clicking on a clone drawn in the region in detail or region overview view displays the accession number from the EMBL database.

Export clones using the export data link at the left.

The international BAC clone nomenclature is described here.

Ordering clones

Ensembl does not have clones for sale, however there are several sources for ordering clones on the web. Try the clone registry. Individual libraries can be found here. Clones can also be ordered from imaGenes, C.H.O.R.I., and Geneservice. Clones from the Sanger Institute can be ordered here.

DIL NOD and CHORI-29 NOD BAC clones for mouse

Both sets can be found in the Other DNA alignments menu in the configure this page dialogue in location views. 129/AB2.2 BACs can be found in this menu, under the name M37-129AB22. See this contact page for the bMQ library 129/AB2.2 BACs. BAC end sequences have been deposited in the trace server.

Where are older or archive sites?

Click on the View in archive site link at the bottom of any page. Or, go to www.ensembl.org/info/website/archives/.

How do I see multi-species comparisons?

Click on the genomic alignments link from any gene or location page to view whole genome alignments for that region. Links at the left for gene trees, orthologues, paralogues and protein families offer sequence alignments on the gene level. To come soon are graphical views of the alignments, formally known as multicontigview and alignsliceview pages. These views are still available from our archive sites.

Can I view exons, introns, and flanking sequence to a transcript?

Yes! For a colour-coded sequence (previously known as ExonView) click on any transcript. From the transcript tab, click on the Exons link at the left. The Exons page allows you to view the transcript sequence, along with flanking and intronic regions. Click on the configure this page link at the left and customise your view. Or, try BioMart for sequence export.

I have a list of old Ensembl IDs from a previous release. How can I find their IDs in the current version?

The gene IDs might be the same in the current version. Search for the gene ID in the browser, or in BioMart. A gene ID can change if the gene structure changes dramatically, for example if a gene is split into two, or alternatively, two genes are merged into one.

If you have a list of IDs, submit them to our ID History converter. Click on the Manage your data link at the left of most Ensembl pages, and follow the link to the converter.

Or, view our older, archive sites.

How can I download all the proteins in the human proteome? Do I use BioMart?

While BioMart can export protein sequences, the entire protein set for any species can be downloaded directly from our ftp site. Please use this method of download, as BioMart cannot handle the very large query of an entire proteome.

If you do need a customized sequence header, consider splitting the BioMart query into chromosome. Make sure the compressed webfile, notify by email option is selected.

For BioMart tutorials, see our video or BioMart FAQs.

What is the difference between Ensembl, Havana and Merged transcripts? And what does known and novel mean?

For human and mouse Ensembl not only shows transcripts that are annotated automatically using the Ensembl genebuild pipeline, but also transcripts that are manually annotated by the Havana team. If the Ensembl and Havana annotation agree with each other the transcripts are combined into a Merged transcript. When a transcript is only annotated by Ensembl or Havana it is named an Ensembl or Havana transcript, respectively. Transcripts that do match a species-specific entry in the UniProtKB/Swiss-Prot or RefSeq databases are categorised as known, those that do not as novel. For more detailed information, please have a look at our genebuild documentation.

Where is the MeDIP data? How can I view and download functional genomics data... and where did it come from?

The DNA methylation profiles from MeDIP-chip and MeDIP-seq experiments are described in these publications:

MeDIP-chip: http://dx.doi.org/10.1101/gr.077479.108

MeDIP-seq: http://dx.doi.org/10.1038/nbt1414

Download these data from the EBI ftp site.

The methylation profiles are viewable as a DAS track in the region in detail page of the Ensembl browser. To find the region in detail, click on the Location tab.

The functional genomics track displays promoter and enhancer predictions based on data from DNase I, ChIP-chip and ChIP-seq experiments. Read more here

These are the data in the regulatory regions track in the region in detail page. Data are accessible using the Perl API. Alternatively, download these data from the Ensembl ftp site. BioMart allows data mining of the functional genomics data.

How does Ensembl determine homology relationships?

For detailed documentation about the homology prediction pipeline, have a look at this article. Orthologues and paralogues are listed in the gene tab or viewable in the gene trees. Click on any node in the tree to export an alignment.

Trees are downloadable from our ftp site.

Please see the following reference for more: Vilella et. al, EnsemblCompara GeneTrees: analysis of complete, duplication aware phylogenetic trees in vertebrates.

How do I get alignments of homologous proteins? Can I get the CDS (coding sequence) alignments as well? I'm using the API.

Yes, both can be obtained using the Compara API: see this example script.

Protein alignments of homologous are also available using the orthologues link from the Gene tab in the browser. The link at the bottom of the page allows a customised view of the protein alignments.

How can I obtain the conserved sequences calculated across multiple species, using the Compara API?

These are the constrained elements. See this example script to obtain them using the ensembl-compara API.

Can I view syntenic regions in Ensembl?

Click on the 'Synteny' link from any Region tab in Ensembl to view conserved blocks of sequences, for example here. Syntenic regions are calculated from the pairwise alignments.

I would like a list of homologues to my gene. Should I look at the gene trees or the families?

Although there is overlap, the EnsemblCompara MCL Families and Gene Trees are two different complementary data sets.

To construct the Gene Trees, only the longest translation of each gene is included, and only species represented in Ensembl are used. However, the methodology has been specifically constructed to find homology relationships.

The families include all Ensembl transcripts plus the Uniprot/Swissprot and Uniprot/SPTREMBL peptides for all the metazoans, which duplicates the total number of peptides represented in the gene trees. These families are clustered using a Markov Clustering method, MCL.

You can view both using the gene tree, orthologues or paralogues, or protein family links from the Gene tab in the browser, or access both using the Compara-API.

BioMart can be used to export homologues calculated from the gene trees.

What are the blast and MCL options used to determine the EnsemblCompara MCL Families?

The families are calculated with the following parameters.

For version v50 (and future versions), the blastall options are:

blastall -d $fastadb -i $qy_file -p blastp -e 0.00001 -v 250 -b 0

For the MCL clustering, the parameters are:

-I 2.1 -tf 'gq(50)' -scheme 6

For version v48 and previous versions, the parameters were:

-I 2.1 -P 10000 -S 1000 -R 1260 -pct 90

Where is the MICER resource for mouse?

The MICER clone set is available from any mouse location tab, in the region in detail. Turn on the MICER track as follows: click on Configure this page at the left of the region in detail view. Click on External data in the left menu of the panel. Turn on the DAS track named MICER clones. SAVE and close the panel. The region in detail view should now reload with the new track displayed.

What is a genome assembly?

The genome assembly is simply the genome sequence produced after chromosomes have been fragmented, those fragments have been sequenced, and the resulting sequences have been put back together. For more information, see the glossary.


Each species in Ensembl has a reference genome assembly that is produced by an international genome consortium. (Ensembl does not produce genome assemblies.) The reference assembly can be compiled from the DNA of one individual, a collection of individuals, a breed or a strain. This depends on the species. Find the DNA source of each genome sequence in the species home page.


Alternate (non-reference) assemblies, such as different individuals, breeds, strains, or haplotypic regions, can be viewed in the Ensembl browser where available.


A genome assembly is updated when DNA has been sequenced that allows gaps to be filled. It may also be updated when a new assembling algorithm is released. Assemblies are updated on the order of once a year, or less often, depending on the species. A new genebuild is performed by Ensembl when there is an update to the genome assembly or when large amounts of new experimental data become available (for example, cDNA and protein sequences).

Can I get the conservation scores (GERP scores) for nucleotides in whole genome alignments?

Yes, obtain the conservation scores, described here, by downloading the emf files for multiple species alignments from our ftp site.

Alternatively, use the Compara Perl API to obtain the scores. An example script is provided here.

Which transcript should I use?

Different splice variants are found in different tissue types, developmental stages, etc. The sequence may be known to a high level of confidence, or it may be a transcript only sequenced once. It is often confusing to be presented with a list of transcripts. However, there are ways to identify the best choice for you.

Choosing a Transcript

  • The CCDS identifier for mouse and human transcripts is an indicator of a well-understood coding sequence.
  • A gold transcript indicates an identical coding sequence or cDNA sequence between two projects (Ensembl and VEGA/Havana). See article for more.
  • The General identifiers link in the transcript tab shows matches for the specific cDNA and/or protein in other resources. For example, the UniProt record contains information about the original submission, and tissue type information (if known) might be found in the submitting publication. Alternatively, if you already have an ID from NCBI, UniProtKB, or another project, try searching Ensembl with the ID to find the matching transcript.
  • If you already know the sequence of the transcript you are working on, BLAST it against the Ensembl cDNA set, or compare it with the Exons or cDNA sequence views found at the left of every transcript tab.

Can I install a local copy of the Ensembl database(s)?

Yes you can. In fact, if you wish to run a script that queries a large amount of information, it is best to install a local copy. Instructions are here.

My human gene is on HSCHR6_COX. What is that?

The MHC region on human chromosome 6 is highly variable. The reference assembly only reflects one possible sequence at this position. Nine haplotypes are included in the human genome assembly hosted by the GRC to describe alternate sequence (with a different allele combination), and 7 of these are in the MHC region. HSCHR6_MHC_COX is one haplotype. Genes are annotated on both the reference sequence, and the haplotypes.


If you have any other questions about Ensembl, please do not hesitate to contact our HelpDesk. You may also like to subscribe to the developers' mailing list.