Inside soccer’s data renaissance

Imagine tuning in to the opening kickoff of a World Cup match and seeing a player intentionally send the ball all the way down the pitch and right out of bounds on the opponent’s end. Casual fans might scratch their heads. Where’s the logic in surrendering possession seconds into a game? If you were Jesse Davis, though, you’d know that this play could be a prime setup to score. 

Davis is a professor of computer science at KU Leuven in Belgium and head of its Sports Analytics Lab, which has been at the vanguard of a data awakening in soccer since its inception more than a decade ago. Though the research group brings machine-­learning models to bear on a variety of sports—including basketball, volleyball, and field hockey—nowhere is its impact felt more than on the soccer pitch. 

Davis and his team of researchers employ advanced data analytics to reveal a range of (beg your pardon) game-changing findings that are shifting pro clubs’ decision-making. “His lab is the most influential sports analytics lab in soccer,” says Hugo Rios-Neto, data recruitment lead for Royal Sporting Club Anderlecht in Belgium. They’ve helped teams better evaluate their rosters, conceived ways to assess how efficient (or not) strategies are, and developed algorithms that uncover hidden tactical patterns.

Like, for instance, the value of kicking the ball out of bounds close to the goal and letting your opponent throw it back into play—a move that’s been popping up in some of the world’s top leagues over the last few years.

To make the statistical argument for this seemingly counterproductive move, Davis’s group built a training data set composed of more than 1.4 million passes and some 60,000 throw-ins—partly from the 2022 World Cup. They used tree ensemble models (essentially a mashup of decision trees) to simulate the tactic. The conclusion, which the researchers presented in a 2024 paper under the apt title “Boot it”: When the ball is in the middle third of the pitch, kicking it out of bounds on your opponents’ side of the field can put you within 10 actions (think passes and dribbles) of a goal. That can be a big deal in a game that has 1,500 or more actions per match and very little scoring. The idea, Davis explains, is that you’re setting yourself up to recover the ball in an advantageous situation.

Beyond providing discrete game-day insights, Davis also occupies a unique niche in the world of sports analytics, where many clubs now hire their own internal data teams to maintain a competitive edge. He makes most of his research freely available via open-source analytics tools, but the academic life also affords him the freedom to tackle more complex problems—like standardizing in-game data, a project that will make it easier to parse game footage and come up with winning strategies. 


Davis, 45, grew up in Wisconsin and spent his childhood enraptured by basketball and (American) football. Soccer was largely a nonentity to him until college, when the 2002 World Cup—in which Brazil famously swept the tournament—reeled him in. But the notion of going on to dissect the sport never crossed his mind. His doctoral studies in computer science at the University of Wisconsin–Madison had him working with radiologists to analyze mammography reports. 

In October 2010, he joined KU Leuven as a computer science professor looking at the intersection of AI and health care, with a focus on monitoring athletic performance. His research team studied, for instance, combining things like heart rate with other metrics to determine whether someone was overtraining. They also dove into the biomechanics of running.

The tactical and technical aspects of sports, and soccer specifically, became the subject of Davis’s professorial work when he hired Jan Van Haaren, an engineering student focused on artificial intelligence and a self-described soccer fanatic. He wondered if data analysis could be used to study things like passing, shooting, and ball progression—metrics the game was only just beginning to digitally crunch at the time. 

Davis realized that machine learning and other artificial-intelligence tools lent themselves well to the complexity, fluidity, and speed of soccer.

You need not be well versed in the moneyball-ization of pro sports to see that it’s relatively easy to apply deep statistical work to baseball or basketball. You can isolate actions like jump shots and assign value to ones taken close or far away. Soon a basketball coach realizes that a player who can’t make a layup, but shoots roughly as well from the three-point line as on mid-range jumpers, might as well go for the shot that gets more points. 

Soccer, by comparison, seemed like a poor candidate for that kind of analysis. “The vast, vast majority of actions really don’t lead to the outcome of a goal or even a shot,” says Rios-Neto. “So it’s hard to elaborate or derive a winning strategy from the data.”

But Van Haaren’s love of the sport, and Davis’s love of sports in general, inspired them to try. Over time, Davis realized that machine learning and other artificial-intelligence tools lent themselves well to the complexity, fluidity, and speed of soccer. In 2014, he officially stood up the Sports Analytics Lab. 

With a stable of about 10 students and postdocs at any one time, the lab began laying what Van Haaren calls the “intellectual foundations of how the game is analyzed today.” The researchers picked apart in-game actions, and suddenly they were valuing ball possession, penalty-kick strategy (aim for the center), and the merits of long shots on goal (take them). “One of the trends that’s been in soccer over the last five to 10 years is that the number of long shots has dramatically increased,” says Davis. “What the data let you do is really quantify what the probabilities of those things are.”


In the years since Davis and his team started untangling individual soccer tactics, their ideas have started to permeate clubs across Europe, like Belgium’s Club Brugge KV, as well as national soccer organizations in the US and Belgium. “The work coming out of the lab is genuinely useful,” Rios-Neto says, “and clubs apply it for a range of purposes.” 

Van Haaren, who’s now the director of football intelligence at Club Brugge, is one of many in-house analysts adapting the lab’s work to the pro game. “Our collaboration with the lab is centered on translating [the team’s] football philosophy into measurable, data-driven outputs,” he says. When a club wants to assess, say, how well a center-back is moving the ball down the field, it aims to tally how many times the ball ended up in the part of the pitch closest to the opposing team’s goal. It does this by combining event data, which records actions on the ball, with tracking data, which records player movement. This shows how well players fulfill their roles, which is useful in development and also when scouting for new recruits. 

Davis’s lab, meanwhile, is continuing to ask questions that apply to the game writ large. To determine if there’s an advantage to taking more long shots, for instance, postdoc Maaike Van Royand colleagues modeled the behavior of English Premier League teams using a Markov decision process—a computational framework in which some actions are under a person’s control while others are random. (That duality is particularly useful for soccer, where movement can feel anything but linear.) The results, presented in 2021 at the MIT Sloan Sports Analytics Conference, showed that Chelsea could gain 1.6 more goals per season by shooting from distance 20% more often.

Despite those kinds of insights from Davis’s lab and similar research groups that have sprung up over the last decade at institutions like MIT and Carnegie Mellon, soccer somewhat lags behind many other pro sports when it comes to collecting the data that analysts need. All teams employ people to watch video and use software to annotate specific in-game tactics—the details of which may make sense only to the most devoted fans. It’s a mostly manual process, one that can take up to six hours per game. “It’s a complete nightmare as a data analyst to work with,” says Davis.

So while the lab plays on, Davis has also joined up with researchers from other institutions in an effort to standardize data across all matches. The group is experimenting with transformers, the neural network architecture that underpins large language models like ChatGPT. If you can bring that to the world of soccer, a human game annotator could tag a tactic—a three-on-two breakaway, say—a few times, and that could train the model on the concept so it could tag subsequent instances on its own. “There’s been a lot of progress,” Davis says. “But it still remains quite hard.”

If we’re keeping score, though, the lab’s work has already made the analytics process easier thanks to open-source tools it’s put out there—some of which clock thousands of downloads a month. One is a framework called VAEP, a model that assesses the effects of all actions on the ball. Another is an xG (expected goals) model, which looks at the quality of a scoring chance. Still another is a package to synchronize event data with tracking data. “Lots of people in industry use our code in their daily workflows,” Davis says.

For him, the practical application of having their code out there is important, but the real (ahem) kick is watching theory become practice. As he says, “I’m really motivated to solve problems that arise in real settings and see my work have an impact.” 

Andrew Zaleski is a contributing writer at Washingtonian magazine. 

Job titles of the future: Nature’s drug designer

In 2018, after nearly two decades working in Big Pharma, chemist Tim Cernak was ready to put his skills to a new use. 

For Merck, he’d developed precision therapies for cancer, HIV, and diabetes that could target disease while minimizing harm to healthy cells. But as a lifelong nature lover, he was increasingly concerned about the health of ecosystems and wondered whether his expertise could transfer. Animals, he learned, are often treated with pharmaceuticals formulated for humans, which affect them like old-school cancer drugs: Though intended to kill abnormal cells, they’re indiscriminate in the harm they cause. For instance, the standard of care for frogs infected with a deadly skin infection is itraconazole, an antifungal that is often lethal for the amphibian.

Cernak imagines a world where “the patient was always meant to be a frog in the first place, from the beginning to the end.” Now an associate professor at the University of Michigan, he’s worked on all types of creatures, from a Gila monster with a parasite to bald eagles with avian flu. Here’s what it takes to treat nature’s patients.

Experience with protein-modeling software 

Developing any type of drug is extremely expensive, failure-prone, and slow-going. But AI can speed up the entire drug-­design workflow, says Cernak. Google DeepMind’s AlphaFold model allows him to visualize a mutant protein’s three-­dimensional structure on a screen—rather than growing it on a plate, the traditional methodology—and then quickly generate possible new drugs that would latch onto that structure. The next step is to run a series of reactions and see which potential drugs may be effective; with the help of robots in the lab, he can speed through as many as 1,500 per day. 

Curiosity about creatures of all sizes

Cernak isn’t selective with his patients. For example, he worked on a treatment for loggerhead sea turtles after he was shocked to learn that the iconic species suffered from contagious tumors. He feels especially drawn to creatures that have helped humans, like the Gila monster, whose hormones have informed popular weight-loss drugs like Ozempic. And it’s not just animals; he’s also developing a precision insecticide to treat hemlock trees under attack from invasive species. 

A pioneering spirit

Cernak refers to this new discipline as “conservation chemistry.” It’s a combination of words with a loaded history, from DDT decimating US bald eagle populations in the 1960s, to cow painkillers killing millions of Indian vultures in the ’90s. He recognizes the risks, but Cernak feels that excluding chemists from conservation is a missed opportunity. 

“I’m just sick of looking at the chemical tools that are used in the conservation space, and they’re not cutting-edge,” he says. “It’s like, how do you have this super high-tech engine over here for making human medicines, while we’re living through a mass extinction?” 

Anna Gibbs is a journalist who covers the intersection between science and society.

Opinion: How long Covid’s scientific stalemate made it politically erasable

Mitchell Miglis had two months left. The Stanford University neurology professor had spent two years studying what long Covid does to the human nervous system — why patients’ hearts race when they stand, why their blood pressure collapses, why their bodies lose the ability to regulate themselves. His National Institutes of Health RECOVER grant was weeks from completion, data collected, analysis underway.

On March 25, 2025, a termination notice arrived. The grant was “incompatible with agency priorities.” No modification could bring it into alignment. “This is not only disappointing and demoralizing from a scientific perspective,” Miglis wrote in the Sick Times, a publication about long Covid, “but in a broader sense, as a clinician who sees these patients every day, a much larger disappointment to the patient community.”

Read the rest…

STAT+: Scientists see promise in NIH proposal to cap number of grants they receive

Throughout Lawrence Tabak’s 25 years at the National Institutes of Health, serving first as the head of one of its institutes before becoming principal deputy director and subsequently acting director, he took many trips to universities around the country to talk to researchers. He made a point to prioritize state schools and smaller institutions. 

Never on those visits was there a shortage of researchers brimming with ideas they hoped would attract the funding to pursue. But without easy access to leaders within a field or top-of-the-line lab equipment, researchers outside top universities often struggle to compete for grants from the NIH.

“There was never an institution I went to that I wasn’t blown away by a few young people,” Tabak said. “But it made me upset, because I realized the maldistribution of resources was compromising their ability to reach their potential.” 

One proposal that’s been floated several times to help spread the wealth is to cap the number of grants individual researchers can receive from the NIH. Most recently, it was proposed in 2017 but was quickly walked back by the first Trump administration after pushback from high-rolling universities who would be harmed by the policy. 

Continue to STAT+ to read the full story…