Citation Counts

Citation counts are a great article-level bibliometric. They don’t lie – they tell us how often a particular paper has been cited by other papers. However, and perhaps because this is such a convenient metric to generate, we may be tempted to extrapolate beyond the data and give false meanings to citation counts. Let’s think about how that works.

One of the challenges with all bibliometrics is that often we don’t pause to ask ourselves ‘What is the denominator?’ Are we only including citations in peer-reviewed journal articles or are we casting a broader net? Even if our sampling pool is limited to peer-reviewed journal articles, each indexing service has a different set of journals from which it samples. This is why, in general, citation counts from Web of Science are lower than those from PubMed. They are both true numbers; they are just drawn from different data sets.

The next challenge comes with the interpretation of citation counts. We can truthfully say that within a particular sampling pool, for example Web of Science or PubMed, paper A has more citations than paper B. There is not much room for argument here. However, citation counts are often used to infer the importance of a paper within the research community. It certainly seems reasonable to think that a paper with more citations has made a more significant contribution to science. However, this reasonable assumption has to be tempered with the understanding that, since citations accumulate over time, citation count also mirrors the age of an article, independent of the quality of the science behind it. And so, again, it is important to look to the denominator, and compare like with like. If we are going to compare citation counts of articles, then we should do so at defined time points, for example 2 or 5 years post-publication.

Along the same lines – comparing like with like – we may want to contextualize our citation counts according to field of study or genre. As an example of the latter, it is a well-known phenomenon that review articles ‘kill’ original data articles.1 That is to say, a case study or other clinical study can be critically important to our understanding of a particular disease – think of the discovery of HIV/AIDS. However, once a good review comes along, others authors are more inclined to cite the review than the original data article. If we are only using raw citation counts as our measure, this might cause us to over-evaluate the contribution of the review article, and under-value the contribution of the case study. One way to correct for this is by supplementing the raw citation counts of original articles with a correction – a reward, if you like – for being cited by other highly cited articles.2

There are many other considerations around the application of citation counts, but we will just finish with one concern that applies to all bibliometrics and scientometrics. That is people will learn to ‘game’ the system, and then the system tries to adjust. Just one example of this is gratuitous self-citation in order to inflate the apparent importance of one’s own papers. None of this invalidates the citation count itself. It does mean, however, that we need to understand the subtleties of metrics and not extrapolate too far.

References

  1. Fu LD, Aliferis C. Models for predicting and explaining citation count of biomedical articles. AMIA 200 Symposium Proceedings: 222-226.
  2. Synnestvedt MB, Chen C, Holmes JH. CitesSpace II: visualization and knowledge discovery in bibliographic databases. AMIA Annual Symposium Proceedings 2005:7240728.

Leave a Comment

Your email address will not be published. Required fields are marked *