ArXiv introduces one-year ban for negligence in the use of AI

One of the largest repositories of scientific preprints is introducing a one-year ban on researchers who submit articles showing clear signs of unedited AI-generated text. The issue is not the use of language models itself, but rather that the author failed to proofread the work before submission.

Researchers who submit articles containing unedited AI-generated text will be banned from ArXiv for one year.

What is ArXiv and why is it important?

ArXiv is an open-access preprint repository where researchers publish their work in physics, mathematics, and computer science prior to peer review. For over three decades, it has been the primary channel for distributing scientific findings in these fields.

Articles on ArXiv are read, cited, and referenced even before they are published in academic journals, and often in place of them. As a result, fabricated references on the platform can spread through the scientific literature just as quickly as those in peer-reviewed publications.

What exactly is against the rules?

Thomas Dietterich, chair of the Computer Science Section, announced that the ban was based on “irrefutable evidence” of the language model’s unauthorized use. Specific examples: fictional links that do not correspond to any actual publication, chatbot instructions left in the text, and placeholder tables with notes such as “fill in with actual numbers from your experiment.”

Once the section chair confirms the violation, the author will receive a one-year ban. Upon the ban’s expiration, the researcher’s subsequent works will only be eligible for publication on the platform after they have been accepted by a peer-reviewed journal.

The scope of the problem

Researchers at Columbia University examined 2.5 million biomedical articles and 126 million citations in PubMed Central. It turned out that the number of fake citations increased twelvefold between 2023 and 2026. While in 2023 a fake citation appeared in about one out of every 2,828 articles, in the first seven weeks of 2026 it was already one out of every 277.

The study’s authors attribute this surge to the widespread use of AI-powered text-generation tools. Previous research indicates that between 30 and 69 percent of citations generated by language models in the biomedical field are fabricated.

Principle, not technology

The new rules deliberately do not address the question of whether AI tools can be used at all when writing academic papers. ArXiv targets only the most obvious violations that can be identified directly from the text, without relying on unreliable AI content detectors.

According to Thomas Dietterich, the principle is simple: if you submit an article, you are responsible for every word in it. Language models have made it incredibly easy to generate text that looks like science but contains nothing of substance. A one-year ban is a relatively mild sanction, but it marks the first formal response by a major scientific platform to a problem that is rapidly gaining momentum.

According to thenextweb.com 

Advertising