Patterns Before Physics

Latent Scholar

by: AdminPosted on: May 19, 2026

Patterns Before Physics

Research Note · AI in Scholarship

Patterns Before Physics

Ask a large language model for a new scientific equation in plain English, and it will give you one. But it will not be reasoning from physics. It will be reaching for the structural patterns its training data showed it most often — and that is a different thing.

      By  Latent Scholar

      Published  May 2026

      Reading time  10 min

LLM capabilities evolve rapidly. The behaviour documented here is current — not permanent. Future models may close some or all of the gaps shown below; see the note in “Physics is not the priority” for context.

9

Equations
Generated

0

Beat the
Human Benchmark

94%

Highest Error
in a Single Equation

When a graduate student is asked to invent a new equation for an under-studied physical problem, the first thing they do is not write an equation. They read. They check what the underlying physics requires. They sketch dimensional arguments on a whiteboard. They look at where existing equations succeed and where they fail. The equation, when it finally arrives, is the output of a reasoning process whose primary constraint is physical consistency.

A large language model, asked the same question in plain English, does not do this. It does something else — something faster, more confident, and structurally familiar. The question this study set out to answer was: what, exactly, is that something?

Choosing a problem with room for new equations

To test whether an LLM can generate a genuinely new physical equation, you need a problem where genuinely new equations are still being written. Most formulas for predicting how a sediment particle settles in water have been developed and tuned for quartz and silica sands. Carbonate sands are different. They are biogenic in origin, irregularly shaped, and the literature on them is comparatively thin. That asymmetry is the test: if an LLM is reasoning from physical principles, the relative scarcity of carbonate-specific formulas is exactly the gap it should be able to fill. If it is doing something else, the scarcity will expose it.

The setup. Nine LLM-generated equation pairs — each pair consisting of a drag-coefficient (C_D) equation and a settling-velocity (ω) equation — were compared against an experimental dataset of 998 calcareous sand grains from Oahu (Smith and Cheung, 2003), and against a published human-developed equation (Riazi et al., 2020) that an independent 2024 review identified as the most accurate existing formulation for carbonate sediments. Two models, Gemini (Thinking) and ChatGPT, prompted in plain conversational English. Five iterations from Gemini, four from ChatGPT.

All nine LLM-generated equations performed worse than the human benchmark. The best of them had roughly twice the benchmark’s error. The worst was off by 94%.
Mean relative error vs. experimental data

What the models reached for

The clearest evidence that the models were not reasoning from physics is in the equations themselves. Across both models, every equation rests on the same skeleton — a Stokes-like low-Reynolds term plus a high-Reynolds asymptote — that has been the dominant form in the sediment-transport literature for more than three decades. What varies between the nine equations is the decoration: which exponents on the shape factor, which numerical coefficients, occasionally a hyperbolic tangent or an exponential thrown in. The form is borrowed. The numbers are new.

Each card below shows both the drag-coefficient equation (C_D) and the settling-velocity equation (ω) produced by the same model in the same session. The two are presented together because, in physics, they are not separable.

Gemini · 5 iterations
All five equations follow the same structural family.

Across Gemini’s five equations, no single literature paper stands out as the anchor — but they all share the same skeleton. Each one is a low-Reynolds Stokes term plus a high-Reynolds asymptote, with shape-factor corrections grafted onto the coefficients. This additive form has been the dominant template for non-spherical particle drag equations in the sediment-transport literature for more than three decades, going back at least to Haider and Levenspiel (1989) and refined by Cheng (1997), Wu and Wang (2006), and many others since. In its general shape:

Standard literature skeleton

$C_D = \frac{24}{Re}\cdot f(\Psi, Re) + g(\Psi, Re)$

A low-Reynolds Stokes term, a high-Reynolds asymptote, and shape-factor decorations. Every Gemini equation is a variation on this template.

And here are all five equation pairs Gemini produced for carbonate sands. Each drag coefficient is a different decoration of the same additive skeleton — different exponents on the shape factor, different numerical coefficients, occasionally a hyperbolic tangent or an exponential thrown in — but the underlying form does not change.

Gemini 1
MRE 39.57%

Drag coefficient · CD

$C_D = \frac{24}{Re}\Psi^{-1.5} + \frac{0.42}{\Psi^2}\left(1 + \frac{12}{\sqrt{Re\Psi}}\right)$

Settling velocity · ω

$\omega = \frac{\nu}{d_n}\left[\sqrt{\left(\frac{10.5}{\Psi}\right)^2 + 1.1,\Psi^{1.5} D_{*}^3} - \frac{10.5}{\Psi}\right]$

Worst

Gemini 2
MRE 93.80%

Drag coefficient · CD

$C_D = \frac{24}{Re}\left(1 + 0.18Re^{0.65}\right)\Psi^{-1.2} + \frac{0.48}{\Psi^{2.4}}$

Settling velocity · ω

$\omega = \frac{\nu}{d_n}\left(\frac{\Psi^{1.5} D_*^2}{18 + 0.35(\Psi D_*)^{1.6}}\right)$

Best LLM

Gemini 3
MRE 20.72%

Drag coefficient · CD

$C_D = \frac{24}{Re}\Psi^{-1.2} + \frac{0.52}{\Psi^2 ER^{0.5}}\left(\tanh\left(\frac{40}{Re}\right)\right) + \frac{1.4(1-\Psi)}{Re^{0.5}}$

Settling velocity · ω

$\omega = \frac{\nu}{d_n}\left[\sqrt{\left(\frac{11.5}{\Psi}\right)^2 + \frac{1.05 D_*^3 \Psi^{0.5}}{1 + 0.2(1 - ER)}} - \frac{11.5}{\Psi}\right]$

Gemini 4
MRE 25.28%

Drag coefficient · CD

$C_D = \frac{24}{Re}\left(1 + 0.18Re^{0.65}\right)\Psi^{-1.5} + \frac{0.48}{\Psi^{2.2}}$

Settling velocity · ω

$\omega = \frac{\nu}{d_n}\left[\sqrt{\left(\frac{11.2}{\Psi^{1.1}}\right)^2 + \frac{1.04,D_{*}^{3}}{\Psi^{0.5}}} - \frac{11.2}{\Psi^{1.1}}\right]$

Gemini 5
MRE 26.09%

Drag coefficient · CD

$C_D = \frac{24}{Re}\Psi^{-1.5} + \frac{0.5}{\Psi^{2.3}}\left[1 - e^{-0.02Re\Psi}\right]$

Settling velocity · ω

$\omega = \frac{\nu}{d_n}\left[\sqrt{\left(\frac{11.5}{\Psi^{1.2}}\right)^2 + \frac{1.02,D_{*}^{3}}{\Psi^{0.6}}} - \frac{11.5}{\Psi^{1.2}}\right]$

ChatGPT · 4 iterations
All four equations follow the same structural family.

Where Gemini drew from the general literature template, ChatGPT did something more specific: every one of its four equations matches the exact form Haider and Levenspiel proposed in 1989. Same skeleton, same structural slots, only the numerical decorations differ between iterations:

Haider & Levenspiel (1989) · the pattern

$C_D = \frac{24}{Re}\left(1 + aRe^b\right) + \frac{c}{1 + \frac{d}{Re^e}}$

The skeleton ChatGPT reaches for, four times in a row.

Each ChatGPT pair below carries the same skeleton, with different coefficients:

ChatGPT 1
MRE 23.22%

Drag coefficient · CD

$C_D = \frac{24}{Re_M}\left(1 + 0.22Re_M^{0.62}\right) + \frac{0.45M^{0.3}}{1 + \frac{42500}{Re_M^{1.15}}}$

Settling velocity · ω

$\omega = \sqrt{\frac{4 g d_n (s-1)}{3 C_D M^{0.15}}}$

ChatGPT 2
MRE 41.70%

Drag coefficient · CD

$C_D = \frac{24}{Re}\left(1 + \frac{0.18Re^{0.67}}{\phi^{0.35}}\right) + \frac{0.42\phi^{-0.25}}{1 + \frac{6000}{Re^{1.1}}}$

Settling velocity · ω

$\omega = \frac{\nu}{d_n}\left(\frac{D_{*}^{3}.\phi^{0.6}}{18 + 0.8D_{*}^{1.65}.\phi^{-0.3}}\right)$

ChatGPT 3
MRE 59.21%

Drag coefficient · CD

$C_D = \frac{24}{Re}\left(1 + \frac{0.21Re^{0.63}}{\phi^{0.40}}\right) + \frac{0.44\phi^{-0.25}}{1 + \frac{8000}{Re^{1.1}}}$

Settling velocity · ω

$\omega = \frac{\nu}{d_n}\left(\frac{D_{*}^{3}.\phi^{0.55}}{18 + 0.75D_{*}^{1.74}.\phi^{-0.35}}\right)$

ChatGPT 4
MRE 60.41%

Drag coefficient · CD

$C_D = \frac{24}{Re}\left(1 + \frac{0.20Re^{0.65}}{\phi^{0.35}}\right) + \frac{0.43\phi^{-0.25}}{1 + \frac{7500}{Re^{1.1}}}$

Settling velocity · ω

$\omega = \frac{\nu}{d_n}\left(\frac{D_{*}^{3}.\phi^{0.55}}{18 + 0.8D_{*}^{1.74}.\phi^{-0.30}}\right)$

Two architectures, two model families, nine prompts. The result is the same: an existing skeleton with new numbers attached. The model has not invented anything. It has retrieved the dominant pattern from its training distribution and decorated it.

Note

Two questions, no coupling

In physics, the drag coefficient and the settling velocity are not independent quantities. A particle settles at exactly the velocity at which gravity balances drag. That force balance is what defines the settling velocity in terms of C_D — meaning the two equations are mathematically linked. A human researcher writing one would derive the other from it, or at minimum verify that the pair is consistent under force balance.

The LLMs did not do this. The drag-coefficient equation and the settling-velocity equation in each card above were generated as separate answers to separate prompts. No coupling between them is enforced. Compare any pair within a single card and you will not find C_D and ω that satisfy force balance for the same particle. The models treated two physically linked questions as two independent text-generation problems — which is itself a signature of the failure mode this post is documenting. The physics that ties C_D and ω together is the kind of constraint a reasoning system would maintain. A pattern-matching system would not, and did not.

The numbers

Each of the nine generated equation pairs was evaluated against the experimental settling-velocity dataset using mean relative error (MRE). The human-developed benchmark of Riazi et al. (2020) is included for comparison.

        EquationMRE
      

Riazi et al. 2020 HUMAN
10.14%

Gemini 3 Gemini
20.72%

ChatGPT 1 GPT
23.22%

Gemini 4 Gemini
25.28%

Gemini 5 Gemini
26.09%

Gemini 1 Gemini
39.57%

ChatGPT 2 GPT
41.70%

ChatGPT 3 GPT
59.21%

ChatGPT 4 GPT
60.41%

Gemini 2 Gemini
93.80%

All errors measured against the settling-velocity equation derived from each model’s own drag-coefficient output, evaluated against the experimental dataset.

The endpoints

Three equation pairs are worth looking at together: the human benchmark, the best LLM pair, and the worst. The contrast is the story.

Human Benchmark

10.14%

Riazi et al. (2020)

Settling velocity

$\omega^2 = \frac{11}{15}\cdot\frac{(S-1)g}{C_D}\cdot S_f^{2/3}\cdot d_n$

where:

$C_D = \left(\frac{9.50\nu}{d_n^{1.5} g^{0.5}} + 0.76\right)^{2.92} + \left(\frac{20.47\nu}{d_n^{1.5} g^{0.5}} + 1.02\right)^{-48.15}$

Settling velocity is derived from drag through force balance — the two equations are linked, not independent.

Best LLM Result

20.72%

Gemini 3

Settling velocity

$\omega = \frac{\nu}{d_n}\left[\sqrt{\left(\frac{11.5}{\Psi}\right)^2 + \frac{1.05 D_*^3 \Psi^{0.5}}{1 + 0.2(1 - ER)}} - \frac{11.5}{\Psi}\right]$

Roughly twice the error of the human benchmark. The best the nine LLM attempts could produce — and even this equation pair is not consistent with the model’s own drag coefficient.

Worst — And Unphysical

93.80%

Gemini 2

Settling velocity

$\omega = \frac{\nu}{d_n}\left(\frac{\Psi^{1.5} D_*^2}{18 + 0.35(\Psi D_*)^{1.6}}\right)$

Predicts that more spherical particles settle more slowly — the opposite of the established physical behaviour.

In Gemini 2, the Corey shape factor sits in the equation in a position that inverts its physical role. The equation is well-formed, dimensionally consistent, and structurally similar to the rest. It is also wrong about basic physics in a way that an expert would catch in seconds. The same model produced both this and the lowest-error LLM equation in the study, in successive prompts. There is no internal consistency check connecting one attempt to the next.

What the equations actually do

The divergence between LLM equations and the physical reference becomes most visible at the limits of particle shape. For nearly spherical particles, most equations cluster near the established curves. As particles become more irregular — the regime where carbonate sediments actually live — the LLM equations spread out, and several of them leave the physical envelope altogether.

Drag coefficient as a function of Reynolds number for various LLM-generated equations and the Wu and Wang (2006) reference, all evaluated at Corey shape factor Psi = 0.2 — **Figure.** Drag coefficient as a function of Reynolds number at Corey shape factor Ψ = 0.2 (highly elongated or platy particles). The Wu and Wang (2006) reference (magenta) levels off to a constant at high Re, as physics requires. Several LLM equations — including Gemini 3, the best-performing one — instead continue to decline at high Reynolds numbers, behaviour that is structurally wrong.

At low irregularity, statistical pattern-matching and physical reasoning produce similar-looking answers because the patterns embedded in the training data do encode the physics for that regime. At high irregularity — the part of the problem the LLM was supposedly asked to solve — pattern-matching and physical reasoning come apart, and what the LLMs return is a fan of inconsistent extrapolations from a literature that was written for a different material. Notice in particular that several curves — including the best-performing Gemini 3 — continue to fall at high Reynolds numbers when they should be approaching a constant asymptote. The equations have the right skeleton but the wrong asymptotic behaviour.

Three things you can see in the output

Each of the following is visible in the data itself, not inferred from outside. Together they describe a specific failure mode that recurs under casual prompting.

01

Same skeleton, different decoration

Every equation rests on the same template. The decoration varies; the underlying form does not.

5/5

Gemini equations share the standard additive literature skeleton

4/4

ChatGPT equations match Haider & Levenspiel (1989) exactly

The models were explicitly asked for novel carbonate-specific formulas. What came back was the dominant family of quartz/silica formulas, with shape-factor terms and exponents adjusted. ChatGPT pinned to one specific historical anchor; Gemini drew from the broader literature template more loosely. Neither departed from the basic Stokes-plus-asymptote skeleton that has been in the sediment-transport literature since 1989. The shape of the formulas the model had seen most often in training is the shape it returned, every time.

02

No Self-Evaluation

Asked to improve, the models produced variations. Not refinements.

MRE across iterations · left to right

Gemini

39.57

93.80

20.72

25.28

26.09

ChatGPT

23.22

41.70

59.21

60.41

—

Gemini ricochets between attempts — best and worst in successive prompts. ChatGPT, asked to improve at every step, instead gets monotonically worse. Neither model is using the prior attempt as a starting point. They regenerate from a similar distribution of possibilities each time, without filtering for whether the new one is better than the last.

03

Best and Worst — Same Model

Gemini wrote the best LLM equation. Gemini also wrote the unphysical one.

Range from same model · mean relative error scale

10.14% · human

Gemini 3 · 20.72%

Gemini 2 · 93.80%

Both extremes produced by the same model, in successive prompts.

In Gemini 2, the Corey shape factor appears in the denominator with a positive exponent — equivalent to a negative exponent in the numerator. The result is a formula that predicts settling velocity decreasing as particles become more spherical. The opposite of established physical behaviour. The same model produced this and the 20.72% equation in successive prompts, which is to say: nothing in the model’s process is filtering out the physically inverted answer.

Physics is not the priority

The most useful way to read this study is not as a verdict on what LLMs can do, but as an observation about what they do when no one is watching. Asked in plain English to invent a new scientific equation, today’s general-purpose LLMs do not start from the physics. They start from the most common structural patterns in their training data and decorate them with shape-factor terms. The output is fluent, dimensionally consistent, and almost always wrong by an order of magnitude that a domain expert would notice immediately.

A model that produces a manifestly nonsensical equation is easy to dismiss. A model that produces a plausible-looking equation that quietly violates the physics is harder to catch — and more consequential when missed.

The gap between human and LLM equation-writing in these results is not really a gap of capability. It is a gap of priority. A human writing an equation for sediment settling cannot help but think about whether the equation behaves correctly in limiting cases — what happens as the particle approaches a sphere, what happens at very low Reynolds number, what happens when the shape factor goes to zero. These checks are not separate from writing the equation; they are the constraints that shape what gets written in the first place. An LLM, prompted casually, has no such constraints. It optimises for the next plausible token. When the prompt asks for a “novel equation for carbonate sands,” novel turns out to mean a familiar structure with different numbers, and for carbonate sands turns out to be a label rather than a constraint.

This is the central observation. The models are not failing to access physics; they are not reaching for it. Under casual prompting, physical consistency is simply not in the objective function. Statistical fluency is.

What writing a scientific equation actually requires

Physical consistency — the equation must respect conservation laws, dimensional homogeneity, and the symmetries of the system it describes.
Correct limiting behaviour — the equation must give the right answer in regimes where the right answer is already known (low Reynolds number, spherical particles, zero shape factor).
Causal grounding — the relationships between variables must reflect actual physical mechanisms, not statistical correlations from one regime extrapolated to another.
Internal coupling — equations that describe related quantities (such as drag coefficient and settling velocity) must be consistent with each other under the physical laws that link them.
Validation against data — the equation must be tested against experimental measurements, with discrepancies treated as evidence of error rather than acceptable variation.

None of these are in the LLM’s objective function under casual prompting. The model produces a sequence of tokens that is statistically plausible given the prompt and the training distribution. Whether the resulting equation respects any of the five constraints above is a question the model never asks.

Snapshot

A note on time

This study was conducted on 11 March 2026 using the Gemini Thinking and ChatGPT (free tier) versions available at that moment. LLM capabilities are evolving rapidly. The behaviour documented here is a snapshot, not a permanent property of these systems.

Future model versions may close some or all of the gaps shown here — that is the expected trajectory, and Latent Scholar expects to repeat this kind of test as the models change. The point of documenting current behaviour is precisely to give future progress something measurable to be measured against. If a successor model produces a physically correct, novel carbonate-sand equation under the same prompts a year from now, that will be a real result, and this post will be the baseline it improves upon.

The pattern documented here may not be permanent. But it is the default that the current generation of general-purpose LLMs returns under casual prompting, and that default is what most users will see most of the time, until something in the model’s objective function changes to put physical consistency on equal footing with statistical fluency.

The Finding

As of March 2026, large language models prompted in plain English generate scientific equations by reproducing the structural patterns most represented in their training data rather than by reasoning from physical principles. The resulting equations are fluent and well-formed; they are also unreliable, sometimes physically inverted, internally uncoupled, and consistently outperformed by equations written by humans whose primary constraint is physical consistency rather than statistical fluency.

Methodology and sources

Two LLMs were used: Gemini in Thinking mode and ChatGPT (free tier), both queried on 11 March 2026. Each model was prompted, in plain conversational English, to produce both a drag-coefficient equation and a settling-velocity equation for carbonate sand particles, using a fixed list of physical parameters (particle dimensions, Corey shape factor, nominal diameter, specific gravity, kinematic viscosity). The two equations were generated within the same session per model but as separate prompts — five iterations for Gemini, four for ChatGPT, for nine equation pairs in total.

Performance was measured against the experimental dataset of Smith and Cheung (2003), 998 calcareous sand grains from Oahu, Hawaii. The human benchmark is the equation of Riazi et al. (2020), identified by Chen et al. (2024) as the most accurate existing formulation for carbonate sediments. Reference comparisons for the drag-coefficient plot use Wu and Wang (2006). The pattern-anchor analysis identifies Haider and Levenspiel (1989) as the specific historical form ChatGPT’s equations follow; Gemini’s equations share the broader additive Stokes-plus-asymptote template that has been dominant in the sediment-transport literature since the late 1980s, without pinning to a single named source.

Selected references

Cheng, N. S. (1997). Simplified settling velocity formula for sediment particle. Journal of Hydraulic Engineering, 123(2), 149–152.

Haider, A., & Levenspiel, O. (1989). Drag coefficient and terminal velocity of spherical and nonspherical particles. Powder Technology, 58(1), 63–70.

Riazi, A., Vila-Concejo, A., Salles, T., & Türker, U. (2020). Improved drag coefficient and settling velocity for carbonate sands. Scientific Reports, 10(1), 9465.

Smith, D. A., & Cheung, K. F. (2003). Settling characteristics of calcareous sand. Journal of Hydraulic Engineering, 129(6), 479–483.

Wu, W., & Wang, S. S. (2006). Formulas for sediment porosity and settling velocity. Journal of Hydraulic Engineering, 132(8), 858–862.

Chen, J., et al. (2024). Experimental study on the settling motion of coral grains in still water. Journal of Fluid Mechanics, 990, A15.

Cite this post

If you are referring to this analysis in academic or technical writing, please use one of the formats below. Note: the URL placeholder latentscholar.org/patterns-before-physics/ should be replaced with the actual published URL once the post is live.

APA (7th edition)

Latent Scholar. (2026, May). Patterns before physics: What LLMs reach for when asked to invent a scientific equation. Latent Scholar Research Notes. https://latentscholar.org/patterns-before-physics/

IEEE

Latent Scholar, “Patterns before physics: What LLMs reach for when asked to invent a scientific equation,” Latent Scholar Research Notes, May 2026. [Online]. Available: https://latentscholar.org/patterns-before-physics/

MLA (9th edition)

Latent Scholar. “Patterns Before Physics: What LLMs Reach For When Asked to Invent a Scientific Equation.” Latent Scholar Research Notes, May 2026, latentscholar.org/patterns-before-physics/.

Chicago (Author-Date)

Latent Scholar. 2026. “Patterns Before Physics: What LLMs Reach For When Asked to Invent a Scientific Equation.” Latent Scholar Research Notes, May. https://latentscholar.org/patterns-before-physics/.

BibTeX

@misc{latentscholar2026patterns,
  author       = {{Latent Scholar}},
  title        = {Patterns Before Physics: What {LLMs} Reach For When Asked to Invent a Scientific Equation},
  year         = {2026},
  month        = {May},
  howpublished = {Latent Scholar Research Notes},
  url          = {https://latentscholar.org/patterns-before-physics/}
}