When scientific citations break down: uncovering “hidden references”

A researcher working alone – isolated from the world and the remainder of the scientific community – is a classic but false image. Research is definitely based on a continuous exchange throughout the scientific community: first you understand the work of others, and you then share your findings.

Reading and writing articles published in scientific journals and presented at conferences is a central a part of a researcher's work. When researchers write a scientific article, they have to cite the work of colleagues to offer context, detail sources of inspiration, and explain differences in approaches and results. Positive citation by other researchers is a vital measure of the visibility of a researcher's own work.

But what happens if this citation system is manipulated? current article within the Journal of the Association for Information Science and Technology Our team of scientific detectives – including information scientists, a pc scientist and a mathematician – has uncovered an insidious method used to artificially inflate the variety of citations by manipulating metadata: secret references.

Hidden manipulation

People have gotten increasingly aware of scientific publications and the way they work, including their potential shortcomings. In the last yr alone 10,000 scientific articles were retractedThe problems surrounding citation manipulation and the harm it causes to the scientific community, including damage to its credibility, are well documented.

Citations of scientific papers follow a standardized referencing system: Each reference explicitly states no less than the title, the names of the authors, the yr of publication, the name of the journal or conference and the page numbers of the cited publication. These details are stored as metadata and will not be directly visible within the text of the article, but are assigned to a Digital Object Identifier (DOI) – a novel identifier for every scientific publication.

References in a scientific publication allow authors to justify methodological decisions or present the outcomes of previous studies, thus highlighting the iterative and collaborative nature of science.

However, by probability, we discovered that when submitting the articles to scientific databases, some unscrupulous actors added additional references that were invisible within the text but present within the articles' metadata. The result? The variety of citations for certain researchers or journals skyrocketed, though these references weren’t cited by the authors of their articles.

Accidental discovery

The investigation began when Guillaume Cabanac, a professor on the University of Toulouse, wrote an article about PubPeeran internet site dedicated to post-publication peer review, where scientists discuss and analyze publications. In the post, he detailed how he had noticed an inconsistency: a Hindawi journal article that he believed to be fraudulent resulting from clumsy wording had way more citations than downloads, which may be very unusual.

The article attracted the eye of several investigators, who now investigate the authors of the JASIST article. We used a scholarly search engine to search for articles that cite the unique article. Google Scholar found none, but Crossref and Dimensions found references. The difference? Google Scholar probably relies totally on the essential text of the article to extract the references that appear within the bibliography section, while Crossref and Dimensions use metadata provided by the publishers.

A brand new form of fraud

To understand the extent of the manipulation, we examined three scientific journals published by the Technoscience Academy, the publisher accountable for the articles with the questionable citations.

Our investigation consisted of three steps:

  1. We have listed the references which might be explicitly present within the HTML or PDF version of an article.

  2. We compared these lists with the metadata recorded by CrossRef and discovered additional references that were added to the metadata but didn’t appear within the articles.

  3. We reviewed Dimensions, a bibliometric platform that uses Crossref as a metadata source, and located further inconsistencies.

In the journals published by Technoscience Academy, no less than 9% of the references captured were “smuggled references.” These extra references were only included within the metadata, skewing the citation count and giving an unfair advantage to certain authors. Some legitimate references were also lost, meaning they weren’t present within the metadata.

Moreover, after we analyzed the clandestine references, we found that they provided great advantages to some researchers. For example, a single researcher related to the Technoscience Academy benefited from over 3,000 additional illicit citations. Some journals from the identical publisher benefited from a couple of hundred additional citations.

We desired to externally validate our results and due to this fact published our study as preprintinformed each Crossref and Dimensions of our findings and provided them with a link to the pre-printed research. Dimensions acknowledged the illegitimate citations and confirmed that their database reflects Crossref's data. Crossref also confirmed the extra references in Retreat clock and stressed that this was the primary time such an issue had been reported to it in its database. The publisher has taken motion to correct the issue based on Crossref's investigation.

Impacts and possible solutions

Why is that this discovery vital? Citation counts have a serious impact on research funding, academic promotions, and institutional rankings. Manipulating citation counts can result in unfair decisions based on false data. Even more worrying, this discovery raises questions on the integrity of scientific systems for measuring impact, a priority researchers have been stressing for years. These systems will be manipulated to encourage unhealthy competition amongst researchers, enticing them to take shortcuts to publish faster or get more citations.

To counteract this practice, we propose several measures:

  • Strict review of metadata by publishers and agencies akin to Crossref.

  • Independent audits to make sure data reliability.

  • More transparency within the management of references and citations.

This study is, to our knowledge, the primary to report on metadata manipulation. It also discusses what impact this may have on the evaluation of researchers. The study underscores once more that over-reliance on metrics in evaluating researchers, their work, and their impact will be inherently flawed and incorrect.

Such overconfidence is more likely to encourage questionable research practices, including forming hypotheses after the outcomes are known, or RAKE; splitting a single dataset into multiple papers, generally known as “salami slicing”; data manipulation; and plagiarism. It also hinders transparency, which is the important thing to more robust And efficient Research. Although the problematic citation metadata and the smuggled references now appear to have been fixed, the fixes, akin to often the case with scientific correctionshappened too late.

image credit : theconversation.com