Standardisation of data entities is of increasing value in ensuring researchers and end-users are able to navigate the world of brassica big data. The adoption of FAIR (Findable Accessible Interoperable and Re-usable) data principles underlies the abilitie of humans, machines and cyborgs to make use of relevbant data and make meaningful connections. Several initives are underway of relevance to the brassica research community. These include development of Trait Dictionaries, Ontologies and standardisation of gene nomenclature (names).
A standardised nomenclature was proposed by Lars Ostergaard (JIC, Norwich) and Graham King (Rothamsted Research) for genes described within the Brassica genus. This enables a distinction to be made between copies associated with the different haploid genomes, as well as at paralogous loci. The nomenclature convention was discussed at the January 2008 MBGP Steering Committee meeting, and then put out for wider consultation within the international research community. Useful feedback was obtained, and incorporated where possible into the subsequent publication.
The standard nomenclature convention has now been circulated to editors of plant and genetics journals, as well as GenBank/EMBL/DDBJ so that there is consistency in use within the literature and database repositories.
<GENUS 1 LETTER> [<species 2 letters>]<GENOME 1 LETTER>|<X>.<NAME 3-6 LETTER CODE>.<locus assignment 1 letter>
where < > surrounds categories, [ ] indicates an optional item and | denotes "or". When referring to gene names, the string is italicized, whilst the corresponding protein name is not.
Following discussions within the MBGP steering group and publication of reference genomes in 2014, the following standard has been adopted for naming of gene models assigned to pseduo-chromosome sequences.
<GENUS 1 LETTER> [<species 2 letters>]<GENOME 1 LETTER>|<X>.<Chromosome number (leading zero)>g<5 digit gene model number>.g<version number>g<1 LETTER designating Genotype/line/cultivar>