Tuesday, 5 February 2013

The citation gap and its effects on taxonomy.

Ask any taxonomist, "Who's doing taxonomy nowadays?", and they'll tell you the same story: hardly anyone.  Funding for taxonomy is drying up in favour of research that'll turn a quick buck or research that explains rather than describes, or, crucially, research that brings in big overhead funding to institutions.

Look in many university biology departments to see who's teaching taxonomy, and the answer is: nobody.  If they have a systematist or two, most likely they'll be doing and teaching molecular phylogenetics.  I'm not knocking molecular phylogenetics; it's critically important work, but there are very few people now cataloguing and describing species, providing identification tools, curating collections, and databasing biodiversity, nor is anyone teaching those skills.

Ask a recent graduate what they learned about biological identification, classification, diversity, and even phylogeny, and likely you'll get a blank look.  Ask what the "C" in C. elegans or the "E" in E. coli stands for.  What's the species name for the Drosophila or Arabidopsis they're experimenting with?  Yet they'll use and misuse organisms' names throughout their careers to retrieve and organise knowledge and to describe their own research.

Sometimes, I don't think our biological colleagues do us any favours either.  In private they might sneer at what we do: it's stamp collecting; only nerds read our papers; it's nit-picking name-changing.  But it's what they do in public, or more accurately what they don't do, that really undermines this essential discipline of science.

What they don't do is cite taxonomists' work.  This is perhaps largely unintentional, but it has a profound effect.  There are a number of reasons.

1.  Taxonomists, like other scientists, publish their research in peer-reviewed journals.  New species have to be named—so other scientists can find and refer to information about them—and they have to be classified—so we can understand what they're related to.  Peer review is a check (not always an effective one, as in any discipline) that the work has been done well.  But the geneticist in the lab or the ecologist or conservationist in the field rarely reads these original papers.  You'd need a whole library to cover the plants or animals of any field site.  So they use the secondary literature: Floras, field guides and on-line resources like databases.  And these are what they cite, if they refer at all to the taxonomic foundations of their research.  Often, large chunks of those Floras, field guides, and databases have been simply copied from the primary literature; and their authors may happily take credit, along with the glory that comes with being cited a lot.  But the original authors, the ones that did the primary research, may rarely be cited, and that affects their careers.  (The plagiarism isn't the main issue that concerns me here, but it does concern me deeply; even when the original source is cited I still see substantial copying as plagiarism.)

I can illustrate this citation gap from my own publications.  Only one has been hugely cited: Flora of New Zealand Vol. 4 (Webb et al. 1988).  (And I must add, in relation to my comment above, that we took great pains—more pains than most readers will ever be aware of—to make sure the descriptions and keys in that book were all original and accurate for New Zealand material.)  It's been cited over 660 times in the scholarly literature (Google Scholar), and tellingly, another 17,000 times in the literature that's not covered by Google Scholar.  At the other extreme, a paper Bill Sykes and I wrote (Sykes & Garnock-Jones 1979) in which we reclassified the native Eugenia maire as a species of Syzygium has been cited just twice in the scholarly literature; you'd think it was a worthless piece of research if you believed the standard measures.  Yet our findings have been used by other biologists; the name Syzygium maire has been used in 119 scholarly papers and in 12,000 other publications and on line.  That reclassification places swamp maire more accurately among its relatives (including the clove); it's not trivial.

2.  There's a convention in biological publishing, which most journals blindly (and perhaps ignorantly) insist their authors follow (Garnock-Jones & Webb, 1996).  That is, when you use a biological name, you also must include the citation of the author who first used that name in that context (as if it were part of the name, which it isn't).  For example, in most scientific journals the name Ranunculus altus must be extended by adding the abbreviation of the describing author's name, in this case me: Ranunculus altus Garn.-Jones.  Why do it?  Well, it's a short-hand bibliographic citation that was used by the early taxonomists like Linnaeus, Banks, de Candolle and others to refer to each other's work in the days before scientific journals.  It doesn't have much meaning nowadays except for the rare circumstance where you need to distinguish between two plants that have the same name: Ranunculus altus Garn.-Jones is a different plant from Ranunculus altus (Julin) Ericsson.  But nowadays when we scientists are supposed to cite all our sources in the reference list at the end of the paper, the old-fashioned author citation convention lets researchers off the hook of citing their taxonomic sources.  And when the value of research work is (gu)estimated by the number of times it's cited in the scholarly literature, this puts taxonomists and taxonomy at a distinct disadvantage.

3.  Sometimes our work gets taken for granted or not cited for other reasons.  Maybe the names and classifications are just taken as a given that doesn't need to be referenced; maybe writers simply don't appreciate the research and expertise behind the names they're using.  Maybe they just don't think it's important enough.

Sometimes the journals themselves restrict the number of references that can be cited, and taxonomy can suffer through this too.  For example, a recent research paper in New Zealand used published phylogenetic taxonomy research as its raw data; the authors could not have done the study without it, but were any of those papers cited?  No, not one, unless you count citation in the supplementary data, which of course doesn't get noticed for the citation indices and impact factors that govern scientists' careers and universities' hiring decisions.  To be fair, those authors' hands were probably tied by the journal's reference limitation policy; it only allows a total of 12 cited references (a policy that might be deviation amplifying, to the advantage of high impact journals).  But the authors presumably made the decision to cite their ecological colleagues in the "proper" references that count, and to relegate their taxonomic colleagues to the supplementary list that doesn't (I am not citing that paper here because it's unfair to single out just one.)

4.  The worth of scientific journals is quantified using "impact factors". The impact factor of a journal in any year is the number of times it was cited divided by the number of papers it published.  Because taxonomists' work doesn't get cited as often as it should (explained above), the journals that publish taxonomic research tend to have low impact factors.  I've heard it said that some "top" journals are no longer publishing taxonomic papers, because taxonomy lowers their impact factors, but I haven't seen proper evidence for this.  Publishing in low impact factor journals is the kiss of death to a research career.  When academics review applications for positions, their judgement of applicants' worth is heavily slanted by the impact factors of the journals they publish in; and looking those statistics up is an easy alternative to the hard work of actually reading the applicants' papers.  The same thing happens when researchers apply for funding.  And there too, the funding body needs to demonstrate its relevance by funding researchers and topics that are likely to get published in high impact journals.  Researchers that can't pull big grants and don't publish in top journals are dead in the water; worse, they're shark bait.

How does this translate into employment trends and funding decisions?  The research success of academic departments and government research labs is increasingly judged by the quantity and quality of their researchers (in New Zealand the university research grading exercise is the PBRF and it's almost the only area of departmental funding that can go up or go down, now that income from student fees is capped) and by the external funding they can attract.  Journal impact factors are hugely important in the PBRF assessment.  PBRF scores directly affect the funding a department and a university will get, and in New Zealand, individual academics are graded, not the departments/disciplines as in the UK.  So departments are caught in a trap.  Even if they're wise enough to see the importance of taxonomy to their students' biological literacy, they simply can't afford to employ taxonomists any more; they're forced to go for a higher-profile discipline and maybe even drop their systematics courses.  Even though the lion's share of university funding is still given for teaching students, the discipline breadth and quality of teaching and the employment success and quality of graduates are rarely measured or questioned with anything like the effort that currently goes into the PBRF.

As systematics researchers are a dwindling pool, there are fewer people to cite their papers, unless ecologists and other biologists change their habits.

Why does this matter?  As the world's population grows and ecological relationships unravel under the stress, our understanding of the diversity of life on earth is becoming more and more critical.  Many countries face increasing extinctions of wildlife, which we wouldn't even be able to document, let alone avoid or repair, without taxonomy.  If that sounds like a bold claim, think about it for a while.  Without taxonomic descriptions, catalogues, and classifications of all those plants, animals, fungi, protists, and microbes, who could possibly notice that some are going extinct; we wouldn't even have recognised extinction as a general phenomenon.

And then, where are the new foods and drugs going to come from?  Who will identify the pests and diseases that threaten agriculture, horticulture, and public health everywhere?  Who will recognise when a local species that's a canary for water quality or temperature rise is replaced by a similar-looking exotic pest with a wider tolerance?  It's not only the world's climate that's close to crisis point.

After I wrote the bulk of this post, two very relevant articles were published.

One claims there are more taxonomists and fewer undescribed species than we have estimated.  This would be good news if future estimates reach the same conclusions, but it does contradict what has been termed elsewhere a mass extinction of taxonomists.  (I do note the authors include untrained taxonomists in their numbers; that's a different issue, but, briefly, if taxonomy is to be practiced as a science, it needs its practitioners to be trained in population genetics, evolution, statistics, comparative biology, and the scientific method, and working in institutions that can archive their specimens and records, and provide collections, lab, and library support.  Many biologists' low opinion of taxonomy leads them to the opinion that anyone can do it without training and a thorough background in relevant science.)

The other article analyses the citation problem, giving reasoned and evidence-based support for much of what I'm saying here.  (Commendably, for a paywalled journal, this opinion piece is freely available.)

So here's a challenge to other biologists: if you value the taxonomic system that enables you to describe and interpret your research, cite at least one taxonomic paper that underpins each paper you publish, and commit to never omitting any relevant ones.  It's a small thing to do to support the discipline that supports your work.

References not hyperlinked in the text.

Garnock Jones PJ, Webb CJ. 1996.  The requirement to cite authors of plant names in botanical journals. Taxon 45: 285-286.

Sykes, W. R.; Garnock Jones, P. J., 1979: A new combination in Syzygium for Eugenia maire (Myrtaceae) of New Zealand. Journal of the Arnold Arboretum 60: 396-401


  1. I agree that taxonomy is incredibly important. I've made heavy use of the NCBI-Taxonomy throughout my career. Often this is the only way to link very disparate data sources. If the taxonomy is incomplete or wrong (which I'm sure it is, but less wrong than anything else) then this affects all sorts of downstream analyses.

    One way to raise the profile of primary literature is to make sure it is cited in the relevant Wikipedia articles. The top Google hit for almost any species name is going to be a Wikipedia article, unless the article hasn't been written yet. If the WP references are more complete, the profile of the field increases and the citations should flow. At least more than previously.

  2. Some thoughts.

    Taxonomy is doing fine

    If you are going to argue that taxonomy is in trouble you will have to address studies such as http://dx.doi.org/10.1016/j.tree.2011.07.010 showing that the rate of taxonomic description has increased, as has the number of taxonomists. The number of taxonomists doing just taxonomy has declined, but this is not the same as saying taxonomy is in decline.

    Does citing taxon authors matter?

    At what point do taxonomic names simply become part of background knowledge? Do we really need to cite the original description of Drosophila melanogaster every time we use that fly (assuming taxonmists see sense and leave the name of this fly alone)? The original description is here: http://biodiversitylibrary.org/page/15211574, and by itself has little content.

    If the name is germane to the topic, then citing the authorities makes sense (for example, if the name change is recent, meaning that relevant past literature won't be found by searching for the current name).

    Can we find the taxonomic citations?

    In an active research field we have rapid publication of short-lived articles. These articles are almost always now online, have DOIs, and are indexed by services such as PubMed and Google Scholar. Hence they are readily findable and - assuming you or your workplace has a subscription - accessible.

    This is not always the case for taxonomic literature. For example, the paper creating the new combination Syzygium maire was not, to my knowledge, online anywhere as an article until I added it to BioStor http://biostor.org/reference/117336 (you can also see it in the Biodiversity Heritage Library as part of the whole volume 60 of the Journal of the Arnold Arboretum http://www.biodiversitylibrary.org/page/9255823 ). Nor is it easy to find this reference. If you know about IPNI you can search there, but you get a citation like this: "J. Arnold Arbor. 60(3): 400 (1979)." (601904-1). This isn't an article citation in the sense that most people understand it. The actual article spans pages 396-401, IPNI simply records the page the name occurs on.

    So, the primary taxonomic literature is often hard to find, not online, and not cited by taxonomic databases in a form that others find useful. If taxonomy is serious about wanting people to cite primary descriptions it needs to tackle these problems.

    Why we cite
    Lastly, it seems to me the underlying issue here that we are using metrics based on citation to capture the "value" of research, and taxonomists want to change people's citation practices to reflect what taxonomists see as the value of their work. I don't think this is a game taxonomists can win.

  3. For what it's worth:
    A call for the application of Appropriate Citation of Taxonomy was made by Seifert et al. 2008 (Persoonia 20: 105). In our journals Blumea and Persoonia we try to apply this principle. What it comes down to I quote from Blumea's Instructions for Authors:

    • All references cited in the text and in the synonymy of the species.
    • Wherever possible references for all DNA sequences used, even if these were downloaded from GenBank.
    • As much as reasonably possible, the original publications for all binomials used in the text, in particular when the binomials are recent. When large numbers of taxa are cited, e.g. for descriptions of vegetation, this requirement may be superseded by the next one.
    • Publications, revisionary or floristic, that form the basis of the taxonomy adopted even if only species names are used that derive their meaning from that publication. Thus, the use of a particular identification tool used in identifying material is to be considered sufficient reason to cite that tool. In the case of long-established species or long lists of species, the citation of a recent revision of this type may replace the citations for the original binomials.

    It would be interesting to find out after 5 years now whether it has made any difference to any living author. Hardly, I'm afraid.

  4. Excellent topic and write up. Sadly, it's not just the non-taxonomists who are at fault of not citing taxonomic works. I've done a fair bit of cataloging and more often than not, when a work includes a synonymy list there is no citation as to where that list came from. Yes, we should cite the author(s) who first describe a species if we are using that species name because not doing so is as bad as using someone's hypothesis without citing them, but a synonymy decision is also a hypothesis and takes work to generate. A synonymy list is often the accumulation of many people's efforts. Although it might be hard to properly cite everyone involved, at least the immediate source of the list(s) can be cited - perhaps an earlier catalog. If any new changes are made, or synonymy decisions were re-reviewed and confirmed this information should be made clear to distinguish what is the current author(s)' work from that which was published previously.

    I say this in part because it's very hard to trace through the literature who did what with names if taxonomists aren't making this clear in their own works. Also, I had an entire catalog I published re-published by another team of authors verbatim, without citing my original. The production team didn't allow such citations! (And this wasn't a cheap knock-off, it was a high-profile European work).

  5. Thank you to the commenters above for adding value, depth, and perspective to my post.

    One more point that I intended to include is that taxonomic papers are often limited in their readership by taxon and by geography, yet their contribution is long-lasting. It's unrealistic to expect a taxonomic monograph to be highly cited within a few years of publication, but it might well be still being cited a hundred years from now. Using recent citations as a measure of quality is fine in principle, but it must be less reliable for comparisons between (sub)disciplines, and it shouldn't be the only measure. As a colleague pointed out to me in a different forum, "Perhaps taxonomists need to emphasise their own metrics to demonstrate usage of the names they create (as one measure of the utility of their work)."

  6. Posted on the TAXACOm list after your blogpost was recommended ther by Jim Croft:

    Nicely written blogpost, and I like the advice at the end, but I was stopped by this:

    'Why does this matter? As the world's population grows and ecological relationships unravel under the stress, our understanding of the diversity of life on earth is becoming more and more critical.'

    I have trouble seeing this as anything more than a non sequitur. Critical for what? Garnock-Jones goes on to suggest:

    (a) documenting (and maybe avoiding or 'repairing') extinction (which inevitably increases with population growth and more intensive Earth use)
    (b) finding new foods and drugs (which help increase population growth and more intensive Earth use)
    (c) identifying 'the pests and diseases that threaten agriculture, horticulture, and public health' (so that increasing population growth and more intensive Earth use aren't held back)
    (d) recognising 'a local species that's a canary for water quality or temperature rise is replaced by a similar-looking exotic pest with a wider tolerance' (i.e., better adapted to a world with explosive population growth and more intensive Earth use)

    Taxonomy doesn't help slow population growth or reduce destructive uses of the Earth. (b) and (c), above, make those problems worse, and (a) and (d) fall under the general heading of 'monitoring': documenting a catastrophe so that those people experiencing it can feel even worse about it, and so that anyone living post-catastrophe can see that we were aware of it but helpless to stop it.

    If taxonomy has a 'critical' role to play, it's in documenting what we're losing (biodiversity salvage), i.e. gathering up the most at-risk bits of the shrinking resource of biological information, simply because that information won't be around in future and will have value for anyone interested.

    We're busily and enthusiastically engineering a world with much lower biological diversity and an abundance of 'similar-looking exotic pest[s] with a wider tolerance' and much wider distributions. What 'critical' role can taxonomy play in this project, which has the support or grudging acceptance of nearly everyone?

  7. This comment has been removed by a blog administrator.

  8. This comment has been removed by a blog administrator.

  9. This comment has been removed by a blog administrator.