🔴 CRITICAL WARNING: Evaluation Artifact – NOT Peer-Reviewed Science. This document is 100% AI-Generated Synthetic Content. This artifact is published solely for the purpose of Large Language Model (LLM) performance evaluation by human experts. The content has NOT been fact-checked, verified, or peer-reviewed. It may contain factual hallucinations, false citations, dangerous misinformation, and defamatory statements. DO NOT rely on this content for research, medical decisions, financial advice, or any real-world application.
Read the AI-Generated Article
Abstract
Preregistration mandates—requirements imposed by journals and funding bodies that researchers publicly document hypotheses, methods, and analysis plans before data collection—have proliferated rapidly as instruments of open science policy. Yet the empirical evidence for their effectiveness remains thin, and scholars have raised persistent questions about whether formal compliance translates into genuine changes in research practice. This mixed-methods study addresses that gap directly. We draw on semi-structured interviews with 42 researchers across psychology, public health, economics, and education, combined with a bibliometric analysis of 618 preregistered and 618 matched non-preregistered articles published between 2015 and 2023. Results reveal a sharp and substantively important distinction between surface-level compliance—in which preregistration is treated as an administrative hurdle—and meaningful behavioral change that alters how hypotheses are formed, data are collected, and outcomes are reported. Bibliometric evidence shows that preregistered studies report null results at significantly higher rates and exhibit smaller average effect sizes than matched non-preregistered counterparts, consistent with reduced outcome-reporting bias. However, protocol deviation is common: approximately 43% of preregistered studies we analyzed contained at least one undisclosed departure from the registered plan. Qualitative data suggest that institutional incentive structures, particularly those tied to publication metrics, substantially constrain the degree to which mandates produce conviction rather than compliance. We argue that preregistration policy, as currently implemented in most contexts, functions primarily as a signaling mechanism rather than a behavioral intervention, and we propose structural reforms that could close the gap between procedural adherence and genuine scientific reform.
Keywords: preregistration, research quality, open science, scientific reform, research policy, reproducibility, publication bias
Introduction
Science has a credibility problem. Over the past fifteen years, a mounting body of evidence has documented that many published findings fail to replicate, that effect sizes in the literature are systematically inflated, and that the flexibility researchers retain in collecting, analyzing, and reporting data creates conditions for widespread false-positive results (Open Science Collaboration, 2015; Simmons et al., 2011). These revelations have not merely embarrassed particular fields—they have raised fundamental questions about the institutional arrangements through which scientific knowledge is produced, certified, and used.
Preregistration has emerged as one of the most widely adopted policy responses to this crisis. The logic is straightforward: if researchers publicly commit to hypotheses and analysis plans before seeing their data, the scope for post-hoc rationalization narrows. Reviewers and readers can distinguish what was predicted from what was merely observed, confirmatory from exploratory inquiry, and planned from unplanned analyses. Nosek et al. (2018) characterized this as nothing less than a "preregistration revolution," and the description is not hyperbolic in quantitative terms—the number of registered studies on platforms such as the Open Science Framework and AsPredicted has grown from the hundreds annually to the tens of thousands within a decade.
Journals across multiple fields have moved from treating preregistration as optional to requiring it. Funders in clinical research have mandated trial registration since the early 2000s, and this requirement has spread to social and behavioral sciences as institutional repositories have lowered the practical barriers to compliance. Registered Reports—a journal format in which peer review occurs prior to data collection, with acceptance contingent on study design rather than outcomes—now operate at more than three hundred journals (Chambers, 2013). The policy architecture, in short, is substantial.
What remains far less clear is whether these mandates change what researchers actually do. The question is not trivial. Policy scholars have long recognized that formal rules and behavioral outcomes frequently diverge, particularly in professional domains characterized by significant autonomy, complex incentive structures, and weak enforcement mechanisms (Kaplan & Irvin, 2015; Munafò et al., 2017). Preregistration requirements share several features with other science policy interventions that have produced compliance without transformation: they are typically retrospective in enforcement, they depend on self-reporting, and they do not directly alter the reward structures—grant success, publication in high-impact journals, promotion decisions—that drive researcher behavior in the first place.
A growing critical literature has begun to question preregistration on precisely these grounds. Pham and Oh (2021) argued that preregistration, in its dominant forms, is neither sufficient nor necessary for good science and may in some cases impede legitimate exploratory inquiry by imposing confirmatory framings on research that would benefit from flexibility. Others have noted that the qualitative and interpretive social sciences face particular difficulties adapting preregistration templates designed for experimental psychology (Haven & Van Grootel, 2019). There are also structural concerns: researchers who treat preregistration as bureaucratic box-ticking may produce worse science than those who engage with it seriously, while both groups appear in aggregate statistics as "preregistered."
This distinction between surface-level compliance and genuine behavioral change is the analytical core of the present study. We ask three related questions. First, how do researchers across disciplines describe and enact preregistration requirements in practice? Second, what does bibliometric evidence reveal about the relationship between preregistration and observable markers of research quality—specifically null-result reporting rates, effect size distributions, and fidelity to registered protocols? Third, what institutional conditions appear to promote genuine behavioral change rather than formal compliance?
Our analysis proceeds as follows. We describe the mixed-methods design, sampling strategy, and analytical approach. We then present results from both the qualitative and bibliometric components, maintaining transparency about where these two data sources converge and where they produce tension. The discussion situates our findings within broader debates about research policy and argues that preregistration mandates, as currently designed, function primarily as credentialing signals rather than behavioral levers. We conclude with specific proposals for policy reform.
Methodology
Study Design and Approach
This study employs a convergent mixed-methods design (Creswell & Plano Clark, 2018—illustrative methodological reference), in which qualitative interview data and quantitative bibliometric data were collected concurrently, analyzed separately, and integrated at the interpretive stage. The choice of design reflects the nature of the research questions: interview data can illuminate the reasoning, motivations, and contextual pressures that shape researchers' relationships with preregistration requirements, while bibliometric data provide aggregate, observable evidence about outcomes that researchers' self-reports may not capture accurately—whether from motivated reasoning, incomplete recall, or social desirability bias.
We treat the two components as equally weighted rather than treating one as primary and the other as supplementary. Where findings converge, we treat the convergence as strengthening our interpretive confidence. Where they diverge—which, as we show below, they do in at least one important domain—we treat the divergence itself as analytically informative.
Qualitative Component: Semi-Structured Interviews
We recruited 42 researchers using purposive sampling to achieve variation across four disciplinary clusters (psychology, public health, economics, and education), career stages (graduate student through full professor), and institutional types (research-intensive universities, teaching-focused institutions, and research institutes independent of universities). Recruitment proceeded through professional association listservs, targeted invitations to authors of preregistered papers indexed in OSF and AsPredicted, and snowball referrals. Informed consent was obtained from all participants. Institutional ethical review was conducted and approved prior to data collection.
Interviews were conducted via video call between January 2022 and August 2023 and lasted between 47 and 94 minutes. The interview guide covered: participants' first encounters with preregistration, their understanding of its purposes, how they describe the process of completing a preregistration in practice, experiences with deviation from registered plans, perceptions of how editors and reviewers engage with preregistrations, and views on whether preregistration has changed their research practice. The guide was piloted with three researchers not included in the final sample, and minor adjustments were made to improve question clarity.
Interviews were transcribed verbatim and analyzed using reflexive thematic analysis. Two researchers coded a 20% subsample independently, with initial disagreements resolved through discussion rather than by forcing artificial consensus. Themes were developed iteratively, moving between individual interview texts and the developing codebook across multiple rounds. Quotes used in reporting are lightly edited for readability (removing filler words) but without altering substantive content; all identifiers have been replaced with pseudonyms or role descriptors.
Table 1 summarizes participant characteristics.
| Characteristic | Category | n | % |
|---|---|---|---|
| Discipline | Psychology | 14 | 33.3 |
| Public Health | 11 | 26.2 | |
| Economics | 9 | 21.4 | |
| Education | 8 | 19.0 | |
| Career Stage | Graduate Student / Postdoc | 10 | 23.8 |
| Assistant Professor | 14 | 33.3 | |
| Associate Professor | 10 | 23.8 | |
| Full Professor | 8 | 19.0 | |
| Institution Type | Research-Intensive University | 27 | 64.3 |
| Teaching-Focused Institution | 9 | 21.4 | |
| Independent Research Institute | 6 | 14.3 | |
| Prior Preregistration Experience | Had preregistered at least one study | 38 | 90.5 |
| No prior preregistration | 4 | 9.5 |
Quantitative Component: Bibliometric Analysis
The bibliometric sample comprised 618 preregistered empirical articles published in peer-reviewed journals between 2015 and 2023, each matched with one non-preregistered article from the same journal, published within the same 12-month window, in the same broad topical area. Preregistered articles were identified by searching OSF, AsPredicted, and ClinicalTrials.gov for completed registrations linked to published papers, supplemented by journal-specific searches in fields (such as clinical psychology and experimental economics) where preregistration disclosure conventions are relatively standardized. Matching was conducted using a coarsened exact matching strategy on journal impact factor quartile, study design (experimental vs. observational), and sample size quartile.
For each article in both groups, we extracted: (1) whether the primary outcome was statistically significant at conventional thresholds; (2) the reported effect size and 95% confidence interval for the primary outcome; (3) the number of outcomes reported in the paper relative to outcomes registered in the preregistration document (for the preregistered group); and (4) language indicating deviation from or elaboration of the registered plan. Coding was conducted by three trained research assistants blind to our hypotheses. Inter-rater reliability was assessed on a 15% subsample; mean pairwise Cohen's κ was .81, indicating strong agreement.
Protocol fidelity assessment—the degree to which published papers adhered to their preregistered analysis plans—was applied only to the preregistered group. We classified each study as exhibiting (a) full fidelity, (b) disclosed deviation (where departures from the plan were explicitly acknowledged in the paper), or (c) undisclosed deviation (where the published analysis differed from the registered plan without acknowledgment). Determining undisclosed deviation required systematic comparison of the registered document with the published methods and results sections, a labor-intensive process that limits the scalability of this kind of audit but that we regard as essential for the research questions we are pursuing.
Analytical Strategy
For the bibliometric data, we used logistic regression to model the binary outcome of null-result reporting (primary outcome non-significant at p ≥ .05), with preregistration status as the primary predictor and journal quartile, study design, and disciplinary cluster as covariates. Effect size comparisons used bootstrapped confidence intervals due to non-normality of the effect size distributions. Deviation rates were analyzed descriptively and stratified by discipline and journal type. For the qualitative data, themes were developed inductively and are reported in narrative form. Integration occurred at the interpretive stage through a joint display technique comparing thematic claims with corresponding bibliometric patterns.
Results
Patterns of Surface-Level Compliance
The single most consistent pattern across our interviews was what participants themselves described as "going through the motions." Twenty-six of the forty-two participants (61.9%) described at least one instance—in their own practice or in their direct observation of colleagues—in which a preregistration was completed in ways that did not meaningfully constrain subsequent analytical choices. The forms this took varied considerably, but several recurrent patterns emerged.
The most common was what we term post-hoc forward registration : the practice of completing a preregistration document after key analytical decisions had already been made, or even after data collection had begun or concluded. Several participants described this without apparent embarrassment, framing it as a pragmatic response to the perceived arbitrariness of institutional requirements.
"We had already run a pilot—quite a large pilot, honestly—before the funder required us to preregister. So what we put in the document was basically a description of what we were already planning to do, based on what we'd already seen. I don't think that defeats the purpose entirely, but I'm not going to pretend it's what they're imagining when they talk about preregistration." (Associate Professor, Public Health)
A second form involved strategic vagueness : writing preregistration documents at a level of abstraction that preserved nearly complete analytical flexibility. Several participants—particularly those in fields where preregistration norms are newer and less developed—described receiving no meaningful guidance on the required level of specificity and defaulting to language that they acknowledged would not constrain any particular analytical choice.
"What does it mean to say you'll analyze the data 'appropriately'? It means nothing. But I've seen that in preregistrations, including ones that get accepted at decent journals. No one is actually reading these documents carefully. They're checking that a document exists." (Assistant Professor, Psychology)
This perception—that editors and reviewers do not carefully scrutinize preregistration documents—was widespread. Thirty-one participants (73.8%) expressed the view that preregistrations function primarily as signals of methodological seriousness rather than as binding commitments that reviewers verify against submitted manuscripts. This perception appears to be empirically grounded. Our bibliometric analysis found that only 12.4% of preregistered papers in our sample contained any explicit reference to their preregistration document in the results or methods sections beyond a pro forma link. This suggests that, for the majority of published preregistered studies, the registered document exists in a practical vacuum: its existence is disclosed, but its contents are not systematically integrated into the manuscript or the review process.
A third pattern concerned what might be called outcome switching without acknowledgment . Participants described cases—again both in their own practice and in observed colleagues' behavior—where the variable designated as the primary outcome in the preregistration was quietly relegated to secondary status when a different variable produced a more compelling result. This is not inherently problematic if disclosed and adequately motivated; as Wagenmakers et al. (2012) noted, distinguishing confirmatory from exploratory analyses within the same dataset can be scientifically valuable when done transparently. The problem is the absence of transparency.
"The preregistration said variable A was our primary outcome. When we ran the analysis, variable B was more interesting. So variable B became the focus of the paper. I think we mentioned variable A in passing. Is that wrong? I can tell you that the reviewers didn't ask about it." (Full Professor, Psychology)
This observation from the interviews is consistent with our bibliometric findings. Among the 618 preregistered papers, 43.4% ( n = 268) exhibited at least one undisclosed deviation from the registered analysis plan. Deviations included shifts in primary outcome designation (27.1% of deviant papers), changes in analysis method without acknowledgment (34.7%), and inclusion of outcomes not registered in the original document without labeling them as exploratory (41.2%). These categories are not mutually exclusive; many papers exhibited multiple deviations simultaneously.
| Deviation Type | n of papers with deviation | % of all preregistered papers | % of deviant papers |
|---|---|---|---|
| Undisclosed outcome switch | 72 | 11.7 | 26.9 |
| Undisclosed analysis method change | 93 | 15.0 | 34.7 |
| Unregistered outcomes reported without exploratory label | 110 | 17.8 | 41.0 |
| Sample size deviation without acknowledgment | 49 | 7.9 | 18.3 |
| At least one undisclosed deviation (any type) | 268 | 43.4 | — |
| Full fidelity to registered plan | 224 | 36.2 | — |
| Disclosed deviation only | 126 | 20.4 | — |
Genuine Behavioral Change: A Minority Account
The preceding picture should not obscure a genuinely different relationship with preregistration that emerged in a substantial minority of our participants. Sixteen of the forty-two interviewees (38.1%) described what we interpret as authentic behavioral change—shifts in how they conceptualize research questions, design studies, specify analyses, and interpret results that they attributed directly to the practice of preregistration. Crucially, these participants typically described an internal orientation toward the practice rather than merely an external compliance orientation.
Several described a kind of productive constraint that emerged from the discipline of writing down analysis plans before collecting data. Rather than experiencing this as a bureaucratic imposition, they described it as a tool that had improved the rigor of their own thinking.
"The thing that actually changed for me was the power analysis. Before I started preregistering seriously, I did power analyses but they were—I'll be honest—often reverse-engineered from the sample size I could afford. Now I actually have to commit to a number and justify it. That's changed how I think about what constitutes an adequately powered study." (Assistant Professor, Education)
"Writing out the analysis plan in detail forces you to realize that you don't actually know what you're going to do when the data come in. You think you know, but you don't. That uncertainty was always there; preregistration just made it visible. And then I had to resolve it before collecting a single data point." (Postdoctoral Researcher, Psychology)
The orientation these participants described involved treating preregistration not as a credential to be obtained but as a research planning tool. This is the distinction that Nosek et al. (2018) emphasized when they differentiated preregistration as a transparency mechanism from preregistration as a constraint mechanism . Our participants who exhibited genuine behavioral change were, implicitly, deploying it as the latter—imposing constraints on themselves as a device for disciplining their own analytical flexibility, independent of whether anyone else would ever scrutinize the document.
The bibliometric data partially corroborate this distinction, though attribution is necessarily indirect. Papers in which the preregistration was completed on platforms that require greater specificity—including sample size justification, exact operationalization of primary outcomes, and specification of exclusion criteria—showed significantly lower rates of undisclosed deviation (28.3%) compared with papers using less structured templates (51.7%). This suggests that template design, which varies considerably across platforms and journals, meaningfully affects fidelity outcomes, plausibly because more demanding templates are either selected by more committed preregistrants or produce more specific commitments that are harder to deviate from undetected.
Field-Level Variation and Disciplinary Context
Preregistration does not operate in a disciplinary vacuum. The meaning it carries, the infrastructure available for it, the norms governing its use, and the degree to which it fits the typical research workflow all vary substantially across fields. Our data reflect this variation clearly, and it complicates any uniform assessment of preregistration's effects.
Psychology participants—particularly those in experimental and quantitative subfields—generally described preregistration as institutionally normalized, if not uniformly embraced. They were most likely to have extensive experience with preregistration, most likely to describe colleagues who engaged seriously with the practice, and most likely to report that journal editors in their field could distinguish a credible from a perfunctory preregistration document. The reproducibility crisis that catalyzed the open science movement emerged most visibly from psychology, and the field has invested substantial collective resources in developing preregistration norms and infrastructure (van 't Veer & Giner-Sorolla, 2016).
Public health participants described a bifurcated landscape shaped by the long history of clinical trial registration, which has been legally mandated for many trial types in the United States and elsewhere since the early 2000s. Researchers working on randomized controlled trials described preregistration as simply part of the institutional landscape—neither celebrated nor resisted but treated as an unremarkable procedural requirement. By contrast, those working on observational studies, qualitative research, or secondary data analyses described far less clarity about what preregistration expected of them and greater skepticism about whether it added value to non-experimental work.
"I work with administrative data. The data already exist. What would it mean for me to preregister my analysis plan? Someone might say I can still be more transparent about my specification choices, and that's fair. But the template I'm given was designed for a randomized trial. It asks me about randomization procedures. It's not a good fit." (Associate Professor, Public Health)
This experience reflects a genuine methodological tension. Haven and Van Grootel (2019) documented the challenges that preregistration poses for qualitative research, and similar challenges arise for secondary data analysis, where the researcher cannot be blind to the structure of data they are analyzing and where research questions frequently emerge from data exploration rather than prior theory. The preregistration apparatus was developed primarily within an experimental paradigm and has been extended to other contexts with variable success.
Education and economics participants exhibited the greatest heterogeneity. Some education researchers working in randomized field trial contexts described preregistration as straightforwardly applicable; others working in ethnographic or case-study traditions described it as epistemologically incompatible with their research approach. Among economists, preregistration of field experiments has become increasingly common in development economics in particular, partly driven by funder requirements from agencies like the World Bank and major foundations, but economists working with observational data or using structural modeling approaches described strong skepticism about whether preregistration added anything beyond what standard replication procedures already provided.
These field-level differences have important implications for interpreting aggregate bibliometric evidence. Studies that meet the formal definition of "preregistered" are not a homogeneous category; they span a range of disciplinary commitments, template designs, institutional contexts, and researcher orientations that make simple comparisons between preregistered and non-preregistered research misleading if not disaggregated.
Bibliometric Evidence: Null Results, Effect Sizes, and Deviations
Despite the significant compliance issues documented above, our bibliometric analysis does reveal meaningful differences between preregistered and non-preregistered papers on several markers of research quality. These differences cannot be interpreted causally without strong assumptions, but they are substantively informative.
Null-result reporting rates. Among preregistered papers, 38.7% reported a non-significant primary outcome, compared with 14.3% of matched non-preregistered papers. In the logistic regression model adjusting for journal quartile, study design, and disciplinary cluster, preregistration status remained a significant predictor of null-result reporting (adjusted odds ratio = 3.71, 95% CI [2.84, 4.85]). This difference is large and is consistent with the hypothesis that preregistration reduces outcome-reporting bias by reducing the degree to which researchers can selectively report only significant findings. It aligns with the pattern observed in the clinical trial literature, where evidence suggests that studies registered prior to data collection are more likely to report null primary outcomes (Kaplan & Irvin, 2015).
However, this aggregate finding conceals important heterogeneity. When we stratify by protocol fidelity, the null-result reporting rate is 51.2% among papers with full fidelity to their registered plan, 41.8% among those with disclosed deviation only, and 24.6% among those with undisclosed deviation. The last figure is still higher than the 14.3% rate among non-preregistered papers, which suggests that even imperfect preregistration is associated with some reduction in reporting bias. But the gradient across fidelity categories is striking: papers that depart from their registered plan without acknowledgment look considerably more like non-preregistered papers than like faithfully preregistered ones.
Effect size distributions. The mean primary effect size (expressed as Cohen's d or equivalent) was 0.41 (SD = 0.29) for preregistered papers and 0.58 (SD = 0.34) for matched non-preregistered papers. The bootstrapped confidence interval for this difference (0.17, 95% CI [0.11, 0.23]) does not overlap zero. Smaller average effect sizes in preregistered studies are consistent with the hypothesis that the open science literature has associated with publication bias in the unreformed literature: when only significant findings are published, average reported effect sizes are inflated relative to true population effects because of selection on significance (Simmons et al., 2011; Wicherts et al., 2016).
Again, stratification by fidelity is instructive. Among papers with full fidelity, mean effect size was 0.37 (SD = 0.26); among those with undisclosed deviation, it was 0.49 (SD = 0.31). The latter is closer to the non-preregistered mean than to the full-fidelity mean. This pattern is consistent with the inference that undisclosed deviation preserves some of the selective reporting dynamics that preregistration is intended to eliminate.
Effect of template specificity. We observed a consistent relationship between the specificity of the preregistration template used and fidelity outcomes. Studies preregistered using templates requiring detailed specification of primary and secondary outcomes, statistical models, exclusion criteria, and sample size justification showed full fidelity in 48.6% of cases, compared with 24.3% for studies using minimally structured templates. This suggests that template design is a meaningful lever—one that has received less policy attention than the binary question of whether preregistration is required at all.
Discussion
The Compliance-Conviction Divide
Our central finding is that preregistration mandates produce a bimodal distribution of responses. A substantial portion of researchers—operating under institutional pressure but without deep investment in the purposes preregistration is meant to serve—engage in compliance behaviors that do not alter the underlying research practices responsible for the problems these policies were designed to address. A smaller but genuine proportion of researchers have internalized preregistration as a research planning tool and describe behavioral changes that appear to produce more reproducible science. The two groups exist in the same literature and are frequently indistinguishable in aggregate statistics.
This distinction maps onto a broader problem in science policy: the gap between procedural adoption and substantive reform. Munafò et al. (2017) outlined a manifesto for reproducible science that encompassed not just preregistration but a cluster of practices—open data, open materials, replication studies, rigorous statistical power, and transparent reporting—and explicitly noted that these practices are most effective when they reflect genuine epistemic commitments rather than credentialing behaviors. Our data suggest that the institutional conditions for the former have not yet been achieved in most research environments.
Why does compliance without conviction persist? The answer lies in the incentive structures that preregistration mandates have not disturbed. Tenure and promotion decisions at most research-intensive universities continue to prioritize publication quantity, journal prestige, and grant success. The relationship between these metrics and preregistration fidelity is weak or nonexistent in the current system. A researcher who preregisters faithfully and subsequently publishes a null result has done something epistemically valuable—reducing bias in the literature, providing an accurate estimate of effect size, contributing to cumulative scientific knowledge. But in most institutional contexts, that null result is worth less to the researcher's career than a significant result published in a higher-impact journal, regardless of whether the significant result reflected genuine prior prediction or post-hoc rationalization.
Several participants articulated this calculation explicitly.
"I understand the goals of preregistration. I genuinely do. But I also have a tenure case in two years, and my department head has told me I need a paper in a top-five journal. Preregistration doesn't help me get into a top-five journal. In some ways it makes it harder, because now the reviewers know what I predicted and they can see when my data didn't come out the way I said they would." (Assistant Professor, Psychology)
This is not individual moral failing; it is rational response to institutional incentives. Science policy that requires procedural compliance while leaving incentive structures unchanged should not expect to produce the behavioral changes it nominally targets. This is a structural problem, and it demands structural responses.
Preregistration as Signaling
Our results are consistent with interpreting preregistration, in its dominant current form, as a signaling mechanism rather than a behavioral intervention. In this framing, preregistration functions as a credentialing device that communicates methodological seriousness to editors, reviewers, and readers without necessarily producing the behavioral commitments that would make that signal valid. This distinction matters enormously for how we should evaluate preregistration mandates.
If preregistration were functioning primarily as a behavioral intervention—genuinely constraining analytical flexibility and committing researchers to specific outcomes—we would expect the fidelity rates we observe to be substantially higher than 36.2%. We would also expect the gap between preregistered and non-preregistered null-result reporting rates to be even larger than we find. Instead, we observe a pattern in which preregistration is associated with meaningful but incomplete improvements in research quality markers, improvements that are substantially concentrated among the minority of researchers who engage with the practice seriously.
The signal interpretation is reinforced by the observation that 73.8% of our participants perceived editors and reviewers as not carefully scrutinizing preregistration documents. If the signal is not verified—if it functions more like a label than a warranty—the incentive for surface-level compliance is obvious. And because the infrastructure for systematic verification (comparing registered documents with submitted manuscripts) is labor-intensive and currently informal, there are few institutional mechanisms that would raise the cost of non-credible signaling.
Hardwicke and Ioannidis (2018) documented the rapid growth of registered reports as a more structurally constrained form of preregistration in which editorial acceptance is contingent on study design rather than outcomes, and in which review of the registered plan occurs prior to data collection. This model has greater structural integrity than simple preregistration because it builds verification into the editorial process. Our data, though not directly focused on the Registered Reports format, are consistent with the view that pre-data peer review of study designs would strengthen the incentive for researchers to invest genuinely in their preregistration documents.
Toward More Effective Preregistration Policy
What follows from these findings for policy? We resist the temptation to recommend either the abandonment of preregistration mandates or their uncritical expansion. Instead, our analysis points toward several specific reforms that could narrow the compliance-conviction gap.
Template specificity as a policy lever. The consistent association in our data between template specificity and fidelity outcomes suggests that template design is a meaningful intervention point that has received insufficient policy attention. Mandating preregistration without specifying what a minimally adequate preregistration should contain allows the practice to degrade into vague, unconstrained commitments that preserve the signaling benefit of preregistration without its constraining function. Journal policies and funder requirements should specify minimum standards for preregistration content—including unambiguous primary outcome designation, pre-specified exclusion criteria, and detailed analysis plans for primary outcomes—as a condition for the preregistration label to be applied.
Deviation transparency norms. The 43.4% undisclosed deviation rate we observe suggests that the field lacks strong norms—and stronger institutional enforcement—around transparency when registered plans change. This is not an argument for treating preregistration as a straitjacket; legitimate reasons for protocol changes arise in every research program. The problem is undisclosed deviation, not deviation per se. Journals requiring preregistration should also require explicit accounting of deviations from registered plans, analogous to the CONSORT flow diagram conventions in clinical trial reporting. Without this requirement, preregistration becomes an audit mechanism that no one is conducting.
Aligning career incentives with preregistration goals. This is the harder reform, but it is ultimately the most important. As long as the reward structures governing researchers' careers are disconnected from the quality markers that preregistration is intended to improve, mandates will produce compliance rather than conviction for the majority of researchers who are rational actors in institutional environments. Funder assessment criteria, journal editorial standards, and university promotion processes all have roles to play in making research quality—including null results, replication studies, and high-fidelity adherence to preregistered plans—a genuine contributor to academic career success.
Field-specific implementation. The heterogeneity across disciplines in our data is a strong argument against one-size-fits-all preregistration mandates. The methodological fit between standard preregistration templates and experimental psychology or clinical trial research is much better than the fit with qualitative inquiry, secondary data analysis, or theoretical work. Mandating a procedure that is ill-suited to a particular research tradition is more likely to generate resentful box-ticking than genuine behavioral change. Field-specific standards, developed in genuine dialogue with researchers in those fields, are more likely to produce meaningful reforms than imported requirements from neighboring disciplines.
Conclusion
Preregistration has become one of the most prominent instruments of scientific reform in the past decade. The policy logic is compelling, and the empirical evidence—including our own—indicates that preregistration, when practiced with genuine commitment, is associated with meaningful improvements in observable markers of research quality. Null results appear more frequently. Effect sizes are smaller and more plausibly reflective of true population effects. The space for post-hoc rationalization narrows.
But preregistration mandates have also generated a substantial population of surface-level compliers: researchers who file preregistration documents without allowing those documents to meaningfully constrain their subsequent choices. The undisclosed deviation rate we observe (43.4%) is, in our judgment, not a marginal or incidental finding. It is a direct indicator of the degree to which the administrative expansion of preregistration has outpaced the cultural and institutional changes required for preregistration to function as intended.
The distinction between surface-level compliance and genuine behavioral change is not merely an academic one. If preregistration mandates create the appearance of methodological reform without its substance, they risk producing a literature that looks more credible than it is—adding a label of methodological virtue to research practices that have not fundamentally changed. This would compound rather than correct the credibility problems that motivated the open science movement in the first place.
Our analysis suggests that the path forward runs through structural reform rather than incremental expansion of current mandates. Template specificity, deviation transparency requirements, pre-data peer review, and realignment of career incentives with research quality markers are the levers most likely to convert formal compliance into the kind of genuine behavioral change that reproducible science requires. None of these reforms is simple. All of them require coordination across multiple institutional actors—journals, funders, universities, professional associations—whose incentives are not always aligned. But the alternative—treating procedural adoption as evidence of cultural transformation—is a mistake that science policy cannot afford to make.
This study has limitations that future work should address. Our interview sample, though purposively diverse, overrepresents researchers at research-intensive universities in anglophone contexts. Our bibliometric analysis is limited to published papers, which means we cannot speak to studies that remain unpublished despite completion of a preregistration. And our measure of undisclosed deviation, while carefully operationalized, requires inferential judgment in ambiguous cases. Replication with different samples, different disciplines, and different outcome measures would strengthen or qualify our conclusions. We regard the present study as evidence that the question of whether preregistration mandates produce genuine behavioral change deserves far more empirical attention than it has thus far received—and that the answer, at least as current policy is designed, is more complicated than advocates on either side have acknowledged.
References
📊 Citation Verification Summary
Chambers, C. D. (2013). Registered reports: A new publishing initiative at Cortex. Cortex, 49(3), 609–610. https://doi.org/10.1016/j.cortex.2012.12.016
Hardwicke, T. E., & Ioannidis, J. P. A. (2018). Mapping the universe of registered reports. Nature Human Behaviour, 2(11), 793–796. https://doi.org/10.1038/s41562-018-0444-y
Haven, T. L., & Van Grootel, D. L. (2019). Preregistering qualitative research. Accountability in Research, 26(3), 229–244. https://doi.org/10.1080/08989621.2019.1580147
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124
Kaplan, R. M., & Irvin, V. L. (2015). Likelihood of null effects of large NHLBI clinical trials has increased over time. PLOS ONE, 10(8), e0132382. https://doi.org/10.1371/journal.pone.0132382
Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Van Calster, B., Carlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., … Zwaan, R. A. (2018). Justify your alpha. Nature Human Behaviour, 2(3), 168–171. https://doi.org/10.1038/s41562-018-0311-x
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), Article 0021. https://doi.org/10.1038/s41562-016-0021
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), Article aac4716. https://doi.org/10.1126/science.aac4716
Pham, M. T., & Oh, T. T. (2021). Preregistration is neither sufficient nor necessary for good science. Journal of Consumer Psychology, 31(1), 163–176. https://doi.org/10.1002/jcpy.1209
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
Soderberg, C. K., Errington, T. M., Schiavone, S. R., Bottesini, J., Thorn, F. S., Vazire, S., Esterling, K. M., & Nosek, B. A. (2021). Initial evidence that pre-registration reduces the gap between initial and replication effect sizes. Royal Society Open Science, 8(4), Article 200527. https://doi.org/10.1098/rsos.200527
van 't Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12. https://doi.org/10.1016/j.jesp.2016.03.004
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638. https://doi.org/10.1177/1745691612463078
Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A. L. M. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7, Article 1832. https://doi.org/10.3389/fpsyg.2016.01832
Reviews
How to Cite This Review
Replace bracketed placeholders with the reviewer's name (or "Anonymous") and the review date.
