OCXLY
OIAS · Research Article · 17 min read

The Neuroscience of Spaced Repetition: Why Forgetting Is the Secret to Remembering

From Ebbinghaus’s 1885 forgetting curve to FSRS and AI-optimized review schedules — the neuroscience of why spacing works, and how modern algorithms are finally catching up to the brain.

Reading mode
The neuroscience of spaced repetition and the forgetting curve

In 1885, Hermann Ebbinghaus sat alone in his study and memorised lists of nonsense syllables — meaningless consonant-vowel-consonant trigrams like DAX, BUP, and ZOL. Then he measured how quickly he forgot them. The result was one of the most elegant and durable findings in all of psychology: a steep exponential decay that flattens with each subsequent review[1]. Memory, Ebbinghaus showed, does not simply fade — it decays at a predictable rate, and that decay can be counteracted by revisiting the material at strategically expanding intervals. For over 140 years, this finding has been replicated across languages, age groups, and material types. It is, by any reasonable standard, settled science.

Yet most educational systems still rely on massed practice — cramming. Textbook chapters followed by end-of-chapter tests. Revision weeks that compress an entire semester into five sleepless nights. The testing effect, also known as retrieval practice — the act of recalling information rather than passively re-reading it — is one of the most well-established phenomena in cognitive psychology. It generalises across ages, materials, and test formats, and is consistently more beneficial than elaborative encoding strategies like highlighting or summarising[4][5]. The evidence is not in question. The implementation is.

The question today is not whether spacing and retrieval work — it is whether artificial intelligence can finally automate the optimal schedule for each individual learner. From Piotr Wozniak’s SM-2 algorithm in 1987 to the open-source FSRS scheduler that became Anki’s default in 2023, the field has been converging on a single goal: a system that knows, for each card in your deck, the precise moment when you are about to forget it — and schedules a review at exactly that point[3]. What follows is a map of the science behind that goal, the algorithms that pursue it, and the distance still remaining.

01

The Forgetting Curve — 140 Years and Still Standing

Ebbinghaus’s original finding was stark: without reinforcement, roughly 50% of newly learned information is forgotten within an hour, and around 70% within 24 hours. The decay follows an exponential curve — rapid at first, then gradually levelling off as the remaining memories prove more resistant to interference. But the crucial insight was not the forgetting itself; it was what happened when the material was reviewed. Each subsequent review reset the curve with a gentler slope — the spacing effect[1]. A fact reviewed after one day, then three days, then a week, was retained far longer than a fact crammed five times in a single session.

Recent neuroscience has reconfirmed Ebbinghaus with modern precision. The strategies align with the brain’s encoding, storage, and retrieval processes[6]. Reviewing material at increasing intervals — one day, three days, one week — significantly improves long-term retention because it forces the brain to reconstruct the memory trace each time, strengthening retrieval pathways rather than merely reinforcing recognition pathways. Spaced repetition involves revisiting studied content at multiple, specifically selected time intervals designed to intercept the forgetting curve just before the memory would otherwise be lost[9].

The theoretical frame is important. Spacing works not because it makes learning easier, but because it makes learning harder in a productive way. Each retrieval attempt is a form of effortful processing that strengthens the neural connections underlying the memory. Massed practice, by contrast, creates an illusion of fluency — the material feels familiar during the cramming session, but that familiarity does not translate to durable recall. The forgetting curve is not a bug in human cognition; it is a feature that, when properly leveraged, becomes the foundation of efficient long-term learning.

02

What the Brain Actually Does When You Remember

The neural basis of the spacing and testing effects has been mapped with increasing precision over the past decade. A 2021 study published in Frontiers in Human Neuroscience examined the long-term neural correlates of retrieval practice using fMRI. The researchers found that retrieval practice during training establishes a unique striatal-supramarginal network at retrieval — the Test Group showed greater activation in the left putamen and inferior parietal cortex near the supramarginal gyrus compared to controls who had only re-studied the material[4]. This is significant because the putamen is part of the basal ganglia, a region associated with procedural learning and habit formation, suggesting that retrieval practice may shift memories from effortful episodic recall toward more automatic, habit-like retrieval.

A complementary study indexed in PMC7821628 demonstrated that retrieval practice facilitates learning by strengthening processing in both the anterior and posterior hippocampus[5]. Brain activity in the posterior hippocampus increased linearly as a function of the number of successful retrievals during initial learning. The anterior hippocampus, associated with encoding and novelty detection, showed a different pattern — its activity was modulated by the difficulty of the retrieval attempt. Together, these findings suggest that retrieval practice does not simply reinforce existing memory traces; it actively restructures them across hippocampal sub-regions.

At the cellular level, the mechanism is long-term potentiation (LTP) — the sustained strengthening of synaptic connections following repeated stimulation. When a memory is retrieved, the neural pathway that encodes it is reactivated, and LTP increases the efficiency of signal transmission along that pathway. Semantic memory consolidation occurs through this process: hippocampal processing results in increased synaptic strength, which over time allows the memory to be supported by neocortical networks independently of the hippocampus[6]. Each spaced retrieval is, in effect, a controlled dose of LTP applied to the specific neural circuit that the learner needs to strengthen.

03

From Leitner Boxes to SM-2 — A History of Scheduling

The practical problem of scheduling reviews has a surprisingly long history. In 1972, the German science journalist Sebastian Leitner published Lernen lernen (“Learning to Learn”), introducing a cardboard box system that operationalised the spacing effect for the first time. The system was elegant in its simplicity: flashcards were sorted into numbered boxes. A correct answer advanced the card to the next box, which was reviewed less frequently; an incorrect answer sent it back to Box 1, which was reviewed daily. The Leitner system is a simple heuristic — but recent research has formalised it and provided the first rigorous means of optimising the review schedule[10].

The next leap came in 1987, when Piotr Wozniak, a Polish graduate student, created the SM-2 algorithm for his program SuperMemo. SM-2 assigned each flashcard an “easiness factor” (EF) — a numerical estimate of how easy the card was for the user — and used it to calculate the interval before the next review. The formula was fixed: interval = previous_interval × EF. If you got a card wrong, EF decreased and the interval shortened. If you got it right, EF stayed the same or increased, and the interval lengthened. SM-2 became the scheduling engine behind Anki when Damien Elmes launched the open-source flashcard application in 2006, and through Anki it became the default for millions of learners worldwide[8].

But SM-2 has fundamental limitations. It treats all learners with the same fixed formula. It has no mechanism to learn from a user’s actual recall patterns over time — the easiness factor adjusts card by card, but the underlying model never updates its assumptions about how human memory works. It cannot distinguish between a card that was answered correctly after genuine recall and one that was answered correctly through a lucky guess. And it has no way to account for the well-documented fact that different types of material — vocabulary, anatomy, legal precedent — decay at different rates. SM-2 was a breakthrough in 1987. By 2020, it was showing its age.

04

The FSRS Revolution — Data-Driven Forgetting

The Free Spaced Repetition Scheduler (FSRS), open-sourced by Jarrett Ye in 2022, changed the field[3]. Where SM-2 used a fixed formula derived from Wozniak’s personal learning data, FSRS fits a statistical model to the user’s actual review history. It estimates, for each card at each moment, the probability that the user will successfully recall it. Reviews are scheduled at the point where that probability drops to the user’s target retention rate — typically 90%. The model learns from every review: a card that is consistently recalled easily will be shown less often; a card that keeps tripping the user up will be shown more.

The results have been striking. Benchmarks on more than 500 million Anki review logs show that FSRS achieves 20–30% fewer reviews than SM-2 for the same retention rate[13]. That is not a marginal improvement — for a medical student reviewing 200 cards per day, it means 40 to 60 fewer reviews daily, or roughly 20 to 30 minutes of time saved. FSRS was adopted as Anki’s default scheduler in version 23.10, released in November 2023, and has continued to be refined through FSRS 4.5, 5, and 6. It requires approximately 1,000 reviews before per-user personalisation outperforms the default weights — a cold-start problem that is manageable for most regular users.

SM-2 approach
  • Fixed formula derived from one person’s learning data
  • Same scheduling logic for all learners
  • No mechanism to learn from actual recall patterns
  • Cannot distinguish genuine recall from lucky guesses
  • No adaptation to different material types
FSRS approach
  • Statistical model fitted to actual review history
  • Per-card recall probability estimation
  • Learns from every review, adjusting predictions continuously
  • 20–30% fewer reviews for the same retention rate
  • Open-source, peer-benchmarked on 500M+ review logs
05

Half-Life Regression — How Duolingo Learned to Teach

While the flashcard community was iterating on SM-2, a parallel line of research was developing inside Duolingo. In 2016, Burr Settles and Ben Meeder published “A Trainable Spaced Repetition Model for Language Learning” at the 54th Annual Meeting of the Association for Computational Linguistics (ACL)[2]. Their model, half-life regression (HLR), combined psycholinguistic theory with machine learning. The core idea was to estimate the “half-life” of each word in a student’s long-term memory — the time it takes for the probability of recall to drop to 50%. Words with short half-lives need frequent review; words with long half-lives can safely be left alone.

The scale of validation was unprecedented. Using 13 million Duolingo student learning traces, Settles and Meeder demonstrated that HLR achieved a 45%+ error reduction compared to baseline models at predicting whether a student would correctly recall a word on a given day. In an operational study deployed to live users, the HLR-based review system improved daily engagement by 12% — students were more likely to return and practice when the system presented them with words at the optimal moment of difficulty. The model and dataset were open-sourced on GitHub[14].

Modern adaptive systems have pushed further. The MEMORIZE algorithm, proposed by Tabibian and colleagues in a 2019 PNAS paper, uses stochastic optimal control to trade off recall probability against the number of reviews[10]. Rather than simply scheduling the next review at a fixed decay threshold, MEMORIZE optimises the entire sequence of future reviews to maximise long-term retention per unit of study time. The approach represents a shift from reactive scheduling (reviewing when you’re about to forget) to proactive scheduling (planning the optimal learning trajectory from the start). AI adapts to each individual’s “forgetting rhythm” — and the evidence suggests it does so better than any fixed algorithm can[11].

06

The Implementation Gap

Despite decades of converging evidence from cognitive psychology, neuroscience, and computer science, most classrooms still operate on a massed-practice model. Textbook chapters followed by end-of-chapter tests. Revision weeks that compress a semester into a few days. Standardised assessments that reward short-term recognition over long-term retrieval. The gap between what the science shows and what the education system does remains stubbornly wide.

Closing that gap requires action on several fronts. First, spaced retrieval must be embedded in learning management systems by default, not as an optional plugin that tech-savvy teachers discover on their own. Second, teacher training on the spacing and testing effects — currently absent from most education programmes — needs to become a standard component of pre-service and in-service professional development. Third, AI-powered platforms that handle the scheduling complexity so that students and teachers do not have to are needed at scale. The algorithmic machinery of FSRS and HLR is sophisticated, but from the learner’s perspective it should be invisible — the system simply shows you the right card at the right time.

Fourth, and perhaps most importantly, the field needs open research and open tools. FSRS is open-source, and its benchmarks are publicly reproducible. Duolingo open-sourced its HLR dataset. But most commercial EdTech platforms still use proprietary scheduling algorithms that are neither peer-reviewed nor publicly benchmarked, and in many cases are demonstrably inferior to the open alternatives. A 2025 study published in Frontiers in Medicine on implementing spaced repetition in paediatrics education found that even modest integration of spaced retrieval into existing curricula produced measurable improvements in knowledge retention[7]. A 2024 review in the International Journal of Asian Social Science Research examined spaced repetition and retrieval practice from a cognitive psychology perspective, concluding that AI-powered spaced repetition systems represent the most promising path toward closing the implementation gap[9].

The science is not waiting for the education system to catch up. The algorithms are improving. The evidence base is growing. The question is whether institutions will adopt what works — or continue to rely on practices that cognitive science abandoned decades ago.

References

  1. Ebbinghaus, H. (1885/1913). Memory: A Contribution to Experimental Psychology. Translation by Ruger & Bussenius, Teachers College, Columbia University. The foundational text establishing the forgetting curve and the spacing effect.
  2. Settles, B. & Meeder, B. (2016). A trainable spaced repetition model for language learning. Proceedings of the 54th Annual Meeting of the ACL, 1848–1858. doi:10.18653/v1/P16-1174. Introduces half-life regression (HLR) and validates it on 13 million Duolingo learning traces.
  3. Ye, J. (2022). Optimizing spaced repetition schedule by capturing the dynamics of memory. FSRS, open-sourced. github.com/open-spaced-repetition/fsrs4anki. The FSRS algorithm that became Anki’s default scheduler in 2023.
  4. Wing, E.A., Marsh, E.J., & Cabeza, R. (2021). Neural correlates of long-term memory enhancement following retrieval practice. Frontiers in Human Neuroscience, 15, 584560. doi:10.3389/fnhum.2021.584560. Demonstrates a unique striatal-supramarginal network established by retrieval practice.
  5. Halamish, V. & Undorf, M. (2020). Retrieval practice facilitates learning by strengthening processing in both the anterior and posterior hippocampus. PMC7821628. Posterior hippocampal activity increases linearly with successful retrieval count.
  6. Wollstein, Y. & Jabbour, N. (2022). Spaced effect learning and blunting the forgetfulness curve. Sage Journals. doi:10.1177/01455613231163726. Reconfirms alignment of spaced repetition with the brain’s encoding, storage, and retrieval processes.
  7. Frontiers in Medicine (2025). Implementation of a spaced-repetition approach to enhance undergraduate learning and engagement in paediatrics. doi:10.3389/fmed.2025.1601614. Demonstrates measurable improvements from integrating spaced retrieval into medical education.
  8. AI-Enhanced Spaced Repetition: Integrating the Ebbinghaus Forgetting Curve and SM-2 Scheduling (2024). Academia.edu. Overview of the SM-2 algorithm and its integration with modern AI approaches.
  9. Spaced Repetition and Retrieval Practice: Efficient Learning Mechanisms from a Cognitive Psychology Perspective and Their Empowerment by AI (2024). International Journal of Asian Social Science Research. Zeus Press. Comprehensive review concluding AI-powered spaced repetition represents the most promising path forward.
  10. Tabibian, B., et al. (2019). Enhancing human learning via spaced repetition optimization. PNAS, 116(10), 3988–3993. doi:10.1073/pnas.1815156116. PMC6410796. Introduces the MEMORIZE algorithm using stochastic optimal control.
  11. Reddy, S., et al. (2016). Unbounded human learning: optimal scheduling for spaced repetition. arXiv:1602.07032. Theoretical framework for optimising spaced repetition schedules without fixed bounds on review count.
  12. Anapolo: A web-based spaced repetition e-learning platform for enhanced long-term memory retention (2024). ACM ICEEL. doi:10.1145/3719487.3719520. Practical implementation of spaced repetition in a web-based learning platform.
  13. Benchmarks: FSRS vs SM-2 performance data compiled from Anki user logs (500M+ reviews). studyglen.com. Empirical comparison showing 20–30% fewer reviews for equivalent retention.
  14. Duolingo Research. Half-life regression: dataset (13M traces) and source code. github.com/duolingo/halflife-regression. Open-source dataset and implementation of the HLR model.