Thursday, December 19, 2024

AI PhD Research Meets Industry Impact: A Grammarly Intern’s Story

Grammarly’s Applied Research Scientist (ARS) internship program cultivates the next generation of AI research talent. In the program, PhD students work on research projects that contribute meaningfully to Grammarly’s mission by informing critical product decisions or paving the way for new AI-powered features. In this blog post, we’ll spotlight Rich Stureborg, a PhD candidate in computer science at Duke University and an ARS intern, who leveraged the program to publish four papers and defend his PhD ahead of schedule. 

Beyond the traditional internship

Rich joined the ARS internship program in the summer of 2023 to study LLM evaluators (models that automate LLM evaluation). Given the strong alignment between his dissertation and a variety of related projects at Grammarly, he quickly saw the opportunity for a long-term partnership. This led to an extended 1.5-year collaboration, where Rich drove his LLM evaluator research forward while contributing to adjacent projects, like building pipelines for synthetic data generation.

“There’s probably been no more productive decision I made for my PhD than joining Grammarly’s research team,” he said. “The work I did at Grammarly went straight into my dissertation. It even helped me defend my PhD early.”

Building confidence in LLM evaluators

Rich’s proudest contribution was his work to understand statistical confidence in the results of LLM evaluators. Using LLMs to evaluate other LLMs is a common strategy, but it has downsides since LLMs have biases and inconsistencies as judges. Furthermore, determining the statistical confidence of LLM evaluators is an open problem. This can make it difficult if you want to launch a new language model into production, for example, but want to be at least 95% certain the new model is better than the old one (and you don’t expect strong signals from A/B testing). 

Rich and the team created a novel methodology: a configurable Monte Carlo simulation (a mathematical technique that relies on random sampling) that computes the confidence of LLM evaluators when comparing two candidate models. They empirically validated the method by comparing it against existing benchmark datasets. This framework gave new insights into how characteristics like evaluation set size can impact LLM evaluation confidence. Rich published the team’s findings and presented them at the 2024 conference of the European Chapter of the Association for Computational Linguistics (EACL). 

Rich credits the research team’s collective style of work for the project’s success: “The research environment is extremely collaborative, as the team is small and tight-knit. Some folks have been here for more than seven years, and they’ve helped me connect within the company when needed.” 

Looking ahead

We’re thrilled to welcome Rich to Grammarly as a full-time research scientist. His journey showcases Grammarly’s commitment to applied research by nurturing talent interested in ambitious research agendas that will have real-world impact. 

Interested in internships at Grammarly? We encourage you to visit our careers page to view our latest openings. 

Related Articles

Latest Articles