The Paper Is Dead

We publish around two to three million scientific papers every year, and that number is usually presented as evidence of progress. Science is accelerating, knowledge is expanding, and the system appears to be working exactly as intended. Yet if one pauses for a moment and takes that number seriously, a more uncomfortable question emerges: are we really producing millions of genuinely new ideas every year? It is difficult to believe that even a small fraction of those papers represent true conceptual breakthroughs.

At some point—often late at night—you find yourself scrolling through them anyway. You open one, skim the abstract, jump to the figures, glance at the conclusions, and move on to the next. After a while, a subtle but persistent feeling sets in. Nothing is obviously wrong, yet very little feels unfamiliar. You have seen something like this before—not the exact same system or dataset, but something close enough to make the experience feel repetitive. A slightly different material, a slightly adjusted parameter, a slightly reframed narrative. And eventually, the realisation becomes difficult to ignore: most papers are not actually new. They are not useless, and they are not incorrect, but they are rarely transformative.

This is not a failure of individual researchers; it is a consequence of how the system is structured. The publish or perish model does exactly what it was designed to do: it rewards output, and therefore output increases. Researchers respond rationally by extending existing frameworks, exploring parameter spaces, and producing technically sound work that contributes incrementally to the field. There is nothing inherently problematic about incremental progress—indeed, much of science has always advanced in this way. The issue arises when incremental work is consistently presented as novelty, and when the language of discovery is applied to contributions that are, in reality, refinements.

Over time, this leads to a quiet dilution of meaning. Novelty becomes a requirement rather than a property, and so it is identified, framed, or gently stretched until it satisfies expectations. A new dataset is positioned as novelty, a new geographical context becomes novelty, and a marginal performance improvement is described in the same vocabulary once reserved for genuine shifts in understanding. Authors are aware of this, reviewers recognise it, and editors encounter it daily. Yet the system continues unchanged, because careers depend not on absolute significance, but on the consistent demonstration of output.

What is far more striking, however, is not the inflation of novelty, but what is neglected in the process. Behind every paper lies the element that required the greatest effort: the data. The experiments, the measurements, the failures, and the iterations represent the real substance of scientific work, often accumulated over months or years. And yet, when the work is published, this substance is compressed into a small number of figures and embedded within a narrative designed for readability rather than reuse.

At that point, something rather peculiar happens. The figures are preserved, the text is archived, but the underlying data becomes effectively inaccessible. It is not lost in a literal sense, but it is locked—locked within images, locked within formatting, and locked within documents that were never intended to serve as reusable scientific objects. Anyone attempting to build upon such work often faces the choice of reconstructing the data manually or repeating the experiment entirely. In a field that depends on cumulative knowledge, this is a remarkably inefficient arrangement.

Science is fundamentally a data-driven activity, and yet its primary output remains a narrative document. In practice, this means that we have optimised the system for storytelling rather than for reuse. A paper, in this sense, is not a complete representation of knowledge; it is, more accurately, a lossy compression of a dataset. The decisions about what to include, what to exclude, and how to present it inevitably remove information that may later prove valuable.

There have been attempts to address this imbalance. The FAIR data principles have been widely discussed, promoting the idea that data should be findable, accessible, interoperable, and reusable. In principle, these ideas are widely accepted. In practice, their implementation remains limited. Empirical evidence shows that only a minority of papers share data in a meaningful way, and even fewer provide code or fully reproducible workflows. Statements such as "data available upon request" are common, yet in most cases such requests are either ignored or declined. The gap between aspiration and reality is not subtle; it is structural.

The reasons for this are not difficult to identify. Preparing a dataset for reuse requires time, effort, and careful documentation, and these activities are rarely rewarded in the same way as publications. A paper contributes directly to career advancement, while a dataset often does not. As a result, researchers prioritise what the system values. The outcome is predictable: an ever-growing volume of papers, accompanied by a fragmented and underutilised body of data.

This situation becomes even more revealing when viewed in the context of recent advances in artificial intelligence. Tools now exist that can summarise literature, generate structured reviews, and even produce credible drafts of academic text within minutes. These tools do not replace expertise, but they do challenge long-standing assumptions about where intellectual value resides. If a passable paper can be generated with relatively little effort, then writing itself cannot be the defining contribution.

What remains difficult—and therefore valuable—is something else entirely. Designing meaningful experiments, choosing the right questions, interpreting unexpected results, and working with real systems all require forms of judgement that cannot be easily automated. In other words, the core of scientific work lies not in writing, but in decision-making and execution. These are the areas where expertise continues to matter, and where the future value of academia is likely to concentrate.

Seen from this perspective, the role of the academic begins to shift. Rather than being primarily a producer of polished narratives, the academic becomes a generator and curator of high-quality data, as well as a decision-maker about what is worth investigating in the first place. This reframing also changes how we interpret the value of existing publications. Even the most incremental paper may contain something of genuine importance—not necessarily in its claims, but in its data. Each experiment contributes a point to a larger landscape, and each dataset represents a fragment of a system that could reveal far more if properly connected.

In this sense, low-novelty papers are not the problem; they are underutilised datasets. The issue lies not in their existence, but in the way they are stored, presented, and ultimately forgotten. If science is to extract greater value from its own output, it must move beyond treating the paper as the final product. Instead, the paper should become an interpretive layer built upon a more fundamental object: a structured, accessible dataset.

Achieving this would require changes that are less technical than institutional. Data publication would need to be recognised and rewarded, reproducibility would need to be valued alongside novelty, and contributions would need to be assessed in terms of their long-term usability rather than their immediate narrative impact. Review papers, in particular, would need to evolve from summaries into curated databases, and experimental work would need to preserve raw measurements and context rather than only presenting refined results.

Ultimately, this shift would also require a degree of honesty. We are not producing millions of breakthroughs every year, and there is no reason to pretend otherwise. What we are producing is something different: a vast, complex, and often inconsistent body of data that has the potential to be extraordinarily valuable if it can be properly accessed and reused.

If science appears to be in decline, it may not be because we are generating less meaningful work. It may be because we are failing to extract the full value from the work we already do. The question, therefore, is not whether we should publish more or less, but whether we are prepared to reconsider what a publication represents. The future of science will not be determined by how many papers we produce, but by how much of what we produce can still be used once it has been published.

Bibliography

Michael Park, Erin Leahey, and Russell J. Funk (2023). Papers and patents are becoming less disruptive over time. Nature.

Jian Wang, Reinhilde Veugelers, and Paula Stephan (2017). Bias against novelty in science.

Mark D. Wilkinson et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship.

International Association of Scientific, Technical and Medical Publishers (2018). The STM Report.

Royal Society (2024). Science in the age of AI.