Our Discovery Mission
Our mission, should we choose to accept it, is to discover and document all remaining Australian species of plants, animals, fungi and other organisms ... in a generation.
Part 3 - Whiteboards
The ideas below have been contributed to the whiteboard for discussion during the roundtable breakouts. You can add to them on the Whiteboard ideas page
Roundtable 3: How do we most effectively use DNA sequencing for rapid and robust species delimitation?
DNA sequencing will clearly play an important role in this mission - indeed, the mission would be impossible without it. Currently, some taxonomists have access to sequencing facilities while others do not, and some are skilled in all aspects of sequencing, bioinformatics and phylogenetics while others are not. This roundtable will consider issues around sequencing, including:
How can we ensure that sequencing speeds up, rather than slows down, species discovery and delimitation?
Should sequencing and bioinformatics support be more centralised or more dispersed than at present?
Should all taxonomists be trained in every step along the sequencing->bioinfoirmatics->phylogeny->species delimitation pipeline, or should we specialise more?
How do we best balance the roles of short, cheap, universal sequences (barcodes) versus longer, more expensive but more informative sequences (up to and including complete genomes)?
Just noting that GBIF already integrated molecular OTU identifiers from BOLD (BIN numbers) and UNITE into its backbone classification.
Centralised national DNA reference database tied to collections specimens and which allows queries of unpublished data thus enabling early data sharing and use (with appropriate attribution).
Aggregators (e.g ALA, GBIF) use formal/traditionally named species to create their taxonomic backbone. Neither create the taxonomy but derive it from sources - for ALA: AFD (Australian Faunal Directory), APNI/APC (Australian Plant Names Index / Australian Plant Census) and NZOR (New Zealand Organisms Register). None of these sources manage the myriad OTUs that will come from an increase in genetic studies. What should be the rules in the future?
* Should the architecture of current names sources be expanded to accept OTUs?
* Should something new be built?
In regards to holistic or specialised training; Yes there should be an understanding of each stage by a practitioner, but I think that more diversification of active roles should be encouraged. Specialists will be able to provide support in a way that could see species description carried out in an "assembly-line". This would also mean that certain specialisations that are less popular to study, or are seen as less lucrative for careers (morphological techniques for example) would be more valued and less likely to be lost, as older taxonomists retire.
Increased use of new sequencing technologies, such as Oxford Nanopore for high throughput multiplexed sample sequencing to increase the processing of samples for species ID against curated databases.
Requirements of authors to publish molecular data (standardised barcodes, MLST markers etc.) alongside novel species descriptions. Development of type material barcodes/MLST markers for previously described species in collections that lack sequence data.
I think funding to support individual research labs to conduct all steps of the process is valuable. Undergraduate researchers in my lab are able to do a lot of great DNA barcoding-based work. This gives them a genuine appreciation of the process and a connection to the organisms. On the other hand, opportunities for labs to send plates of tissue for automated DNA extraction / PCR / sequencing protocols would be great if students were able to stay involved at least in the analysis step via something like BOLD.
In my opinion having molecular data is incredibly valuable but unless the goal is to reconstruct deep phylogeny or mine out particular genes of interest, shotgun sequencing is not a very valuable approach for biodiversity and taxonomy studies. There are a lot of lousy genomes out there. What we need are more good genomes from understudied taxa (especially non-insect and non-nematode invertebrates and "protists")
Build data repository capability for raw data to enable multiple use of data (e.g. with updated bioinformatics pipelines) and standards for documentation of data processing for each deposited dataset.
Boldly embark on building an automated and shared national bioinformatics pipeline for marker development (e.g. from genomic/transcriptome data, openly sharing developed marker systems in real-time) to shared analysis pipelines. More specialisation would ensure that a dedicated workforce updates and maintains these analytical tools as the field will keep evolving reducing the need of taxonomists to keep up with technological/bioinformatic changes to focus on interpretation of results and species descriptions.
Should we sequence types in priority? This is a clear message I got from listening to Bryan's talk and other zoologists among the presenters. It makes full sense, but I note that within the botanical community we are taking the opposite approach! For instance, for the GAP project, to build our reference phylogenomic tree of flowering plants in Australia, herbarium types are going to be the last specimens we resort to when everything else (living, silicagel, other more recent herbarium spcimens) has failed.
1. Genetic reference from type material is needed.
2. More data about genetic variation across a species range is needed.
3. I wonder if The often observed incongruence between observed patterns of genetic divergence between species and morphospecies designations may relate to the incompatibility of data. In corals for example, we sequence regions such as the mtCR which may have many and variable functional roles -so we could be seeing signal relating to rapidly evolving cellular processes. We still do not know which parts of the genome are responsible for coral biomineralisation and growth, but if we could do R&D to find markers that relate to growth form then those may be more compatible with the observed morphospecies patterns.
Understanding of international law and genetic materials - and the provenance tracing needed
Importance of collection of tissues with vouchers, maintaining identifiers, and tracking information
Working together to achieve taxonomic stability using molecules (rather than competitively)
To balance roles between shorter barcodes versus marker systems with longer or more markers, maintain shorter barcode approaches for taxonomic groups in which these still yield sufficient information and progress to more elaborate marker systems as required (keeping the previous marker sets in the new sets to enable longevity of data usability). Discourage use of one-off datasets, i.e. those which can not easily build upon (e.g. RADseq, DArT)
Fully transition to molecular marker systems which are suitable for DNA degraded collection specimens. This will enable building a baseline genetic reference data set for type species, specimens form type localities, from specimens of taxonomic synonyms and extinct populations, and to incorporate already existing collections over it geographic range in molecular species delimitation and discovery (thus reducing the requirement of fresh field collections).
Need a process whereby new species are identified and logged as acceptable (legally) even though they have not been formally described. Similar to what happens at the WA Herbarium where, following a system of vetting, new taxa are given a "phrase name" and then added to the list of biodiversity. There are many taxa on iNaturalist which are recognised as new but cannot be highlighted due to lack of a "formal name". Allowing this to happen will expand the number of new taxa being recognised and also allow better sharing of information across levels of understanding.
Ideally, the future taxonomists should have all the skills necessary to identify morphologically and molecularly a species. In order to get there, we need to have DNA vouchers for as many described species as possible (ideally all of them), in order to make molecular identification competitive and being able to immediately recognise an undescribed species.
Unfortunately, we are not at that stage yet. We still require a DNA voucher from a huge number of described species. In order to get this, we need morphological specimens of these species, often preserved in the personal collection of experienced taxonomists.
In my experience, it can be quite hard to convince some taxonomists to include DNA analysis, such as COI barcoding, in their works. Taxonomist that have never (or rarely) worked on DNA might feel overwhelmed, especially if they are at a stage of their career where they do not see the point in learning a new techniques.
Often, these taxonomist are more 'morphology oriented' and also have a great knowledge on certain groups, perhaps owning a personal collection of specimens.
I think that Taxonomy Australia should act as a bridge between these taxonomists, with their knowledge and collections, and those researchers and institutions that can provide a barcode/sequence/genome for their samples. Using samples from collections that have a certain ID can prove invaluable for future molecular analysis, but we cannot expect everyone to start extracting DNA from their specimens.
- A policed database would prove far more beneficial then GenBank (especially in the new age of eDNA). Sequences on GenBank are not always reliable. Sometimes too much time and effort are required to verify if the uploaded sequences are from a peer-reviewed source and can be trusted. Additionally, there is not always consistency in the way sequences are uploaded based on name, marker name variations and keywords.
- Having access to museum specimen sequences in a database could speed up species discovery if such databases exist and could be shared amongst researchers. Having access to a database where we can access images of the voucher specimen and then find associated published marker sequences would be fantastic. This would provide a useful starting point for many marine taxonomic projects. Knowing the specimen has gone through the museum identification workflow and has the corresponding molecular work associated. Access to such a reliable database would be excellent for eDNA pipelines, having a guarantee for species identification using OTU (operational taxonomic units – sequence data from eDNA workflows) searches.
More (or better) ideas? Add them here.