The Students Who Got Better at Reading by Doing Puzzles

What economists found when they gave kids puzzles instead of math, and what it reveals about every student who has ever faded in the second half of a test.

The short version: The gap between struggling students and their peers is not just about what they know. It is about how long they can keep thinking. And that capacity responds to better schools and more practice in sustained, independent work.

The research, in slides (swipe or click through)

The setting

In the spring of 2019, a group of elementary school students in Lucknow, India, sat down for their end-of-year exams. Some of them had spent part of their school week, for the past five months, solving mazes, tangrams, and other puzzles on bare-bones tablets. Others had spent that same time practicing math problems. A third group attended a standard study hall.

The puzzle group’s Hindi grades went up. So did their English grades.

There is no obvious reason this should have happened. The mazes contained no letters. The tangrams contained no vocabulary. The students who solved them learned nothing that could plausibly transfer to a language exam, and yet the grades moved. And they moved by roughly the same amount as the grades of students who spent that same time practicing actual math problems.

That is the puzzle this paper sets out to explain. The answer, it turns out, has less to do with what these students practiced and more to do with the act of practicing itself.

The problem hiding inside every standardized test

Before getting to the experiment, consider something that happens on every long exam. The questions near the end are harder to get right than those near the beginning, even when difficulty is held constant. This is not because students are running out of knowledge. This is because sustained thinking depletes mental resources over time, leading to degraded performance.

The authors of this study document this systematically using two of the most widely administered academic assessments in the world: TIMSS (Trends in International Mathematics and Science Study), given to fourth-graders across more than 50 countries, and PISA (Program for International Student Assessment), given to 15-year-olds across more than 60 countries. In both datasets, question order is randomized across students, meaning a question that appears late in the test for one student might appear early for another. That randomization allows isolating the effect of position alone.

The pattern is consistent across subjects and countries: students are more likely to get a question wrong simply because it appears later, after they have already been thinking for a while.

But the economists add to this well-documented pattern by looking at who fades fastest. In TIMSS, Black and Hispanic students in the United States show 72% steeper performance decline over the course of the test than white students. Globally, students in poor countries show three times the rate of decline as students in rich countries. While students from more advantaged backgrounds often start the test with higher scores, fatigue-driven gaps constitute an additional disadvantage. They aren’t about what students know at the beginning; they emerge from differences in how quickly mental effort degrades as the test goes on.

The economists also find a suggestive explanation. In schools serving more disadvantaged students, students spend considerably less time in focused, independent cognitive work during the school day. In poor countries, students spend about 40% less time in independent practice than their wealthier counterparts. In the United States, students from lower-income households spend roughly 10% less time in such practice. Economists believe this independent practice, i.e., working alone without a teacher’s constant help, may be important because it forces students to deliberately exert their own attention. Students who receive more of this kind of practice also tend to show less steep cognitive decline. The experiment was designed to test whether that relationship is causal.

The method, plain English

The study recruited 1,636 students in grades one through five across six low-income private primary schools in Lucknow, India. Students were randomly assigned, at the individual level, to one of three conditions.

The first group, the Math arm, practiced math problems on simple tablets (without any flashy apps, animations, or distractions) using software that adjusted difficulty to each student’s level. No instruction, just problems. This was designed to mimic what good schooling does: independent, effortful academic work for a sustained stretch.

The second group, the Games arm, played cognitively demanding puzzles on the same bare-bones tablets: mazes, tangrams, and similar games containing no letters, no numbers, and no academic content of any kind. This arm was the more important test of the researchers’ theory. If sustained effortful thinking matters rather than academic content specifically, the Games students should show gains comparable to those of the Math students. If academic content is what matters, they should not.

The third group, the control, attended a standard study hall. A teacher wrote a few problems on a chalkboard and sat down. Most students, particularly those below grade level, did not engage meaningfully with the work. The control group also had some tablet time for simple tasks to ensure the results weren’t just due to the novelty of the technology.

Sessions ran once to three times per week, for roughly 20 minutes each, from August through January. In total, treated students accumulated between 10 and 20 hours of additional cognitive practice. Teachers were kept unaware of student assignments throughout, and staff were rotated between groups to keep the experiment fair.

Academic grades in Hindi, English, and math were collected from the schools at midyear and year’s end. The research team also administered their own tests in three domains: listening comprehension, fluid intelligence using Raven’s progressive matrices (a pattern-based IQ test requiring no reading), and math. Question order was randomized within each test, giving the team a clean way to measure whether students held up better in the second half of a test than the control group did. Note that every student had more than enough time to finish, so the drop-off in the second half was purely about mental fatigue

What they found

Puzzles improved reading grades. Students who received cognitive practice saw a significant boost in their grades across Hindi, English, and math. To put the size of that improvement in perspective, it was nearly as large as the gains children achieve when schools dramatically reduce class sizes for an entire year, one of the most-studied interventions in education research. That result came from only 10 to 20 hours of puzzle practice, not a full year of smaller classes.

The subject matter didn’t matter. The Math arm and the Games arm produced nearly identical results. Students who spent their time solving mazes improved their reading and language grades by essentially the same amount as students who practiced math problems. Since the mazes had no words or letters, there was no lesson to transfer. The evidence suggests that sustained cognitive effort itself played an important role, although the study cannot completely rule out other mechanisms.

Students faded more slowly. On long exams, typical students were about 12% less likely to get a question right in the final stretch than they were at the start. Students who had practiced their endurance showed 22% less of that fade. They did not necessarily start the test knowing more, but they were able to use what they knew for longer.

The gains were long-lasting. When researchers returned three to five months after the program ended, after a summer break and a grade transition, students still showed the same improvement in staying power. The effect had not worn off.

Motivation is not the explanation. You might think these kids were just trying harder, but the evidence says otherwise. Researchers offered a separate group of students highly coveted prizes for scoring well on tests. The prizes made students perform better at the very start of a test, when they were fresh. But the prizes did nothing to slow the fade as the test wore on. Wanting to do well is not the same as having the capacity to keep going.

What this study can and cannot tell us

The study was conducted in low-income private schools in a single city in India. How far these findings travel to other educational contexts, including U.S. public schools, schools with more resources, or older students, is an open question. The researchers are careful about this, and the paper does not claim generalizability beyond what the design supports.

The persistence finding, while encouraging, should be read as suggestive rather than definitive. Effects measured three to five months after the intervention are consistent with durable gains, but longer-term follow-up data, which the researchers were unable to collect due to disruptions from the COVID-19 pandemic, would be needed to say more.

The paper also uses enrollment cutoff data from Pakistan to estimate whether an additional year of schooling improves cognitive endurance. The authors are explicit that this evidence is suggestive. It complements the experimental findings but does not carry the same causal weight.

So, what?

The paper closes with two examples of mental fatigue appearing outside schools: among data-entry workers and voters navigating long ballots. In both cases, the same pattern emerges. Less educated workers experience declines in accuracy roughly twice as fast as more educated workers. Voters in less advantaged precincts become increasingly likely to default to the listed option as the ballot grows longer, at a faster rate than voters in more advantaged precincts. These examples are suggestive, not causal, but they raise the question of whether the consequences of unequal cognitive endurance extend well beyond childhood test scores.

For anyone who works with students, the practical implication runs parallel to what Cotton et al.’s productivity paper found about homework completion: the barrier may not be what it appears to be from the outside. A student who fades in the second half of a test is not necessarily less prepared or less motivated. She may simply have had fewer opportunities to practice hard, sustained thinking in a school environment that never asked her to.

The researchers note that some schools with strong outcome records in low-income U.S. settings place heavy emphasis on frequent testing and sustained independent work. Whether or not those schools designed their practices with this in mind, the incidental training may be part of what makes them effective.

The broader implication is a reframing of what schools are doing when they work well. Schooling transmits knowledge and skills. But it may also, whether it intends to or not, train the raw capacity to keep thinking when it gets hard. Whether a school builds that capacity, and for which students, is a question worth taking seriously.