Andrej Karpathy Builds AI Research System Using 630 Lines of Code

The repository contains a simplified training implementation derived from Karpathy’s nanochat project and is designed to run on a single NVIDIA GPU.

Share

Andrej Karpathy, the former OpenAI researcher, has released a minimal open-source project that allows AI agents to autonomously run and iterate on large language model training experiments using a compact codebase of roughly 630 lines.

The project, called “autoresearch”, was published on GitHub and is designed to let AI agents modify model training code, run short experiments, evaluate the results, and repeat the process in an automated research loop.

Karpathy said the goal of the project is to create a small yet functional research setup in which an AI agent can test architectural changes, hyperparameters, and optimisation strategies without continuous human supervision. In the repository documentation, he described the concept as giving “an AI agent a small but real LLM training setup” and letting it “experiment autonomously overnight.”

The repository contains a simplified training implementation derived from Karpathy’s nanochat project and is designed to run on a single NVIDIA GPU. The system separates the roles of humans and AI agents: researchers modify a Markdown file that defines the agent’s research instructions, while the agent edits a single Python file that contains the model architecture, optimiser and training loop.

Each training run operates under a fixed five-minute wall-clock time budget, which ensures experiments remain comparable even when the agent changes model size, batch size, or architecture. According to the repository, this constraint allows roughly 12 experiments per hour and about 100 experiments overnight.

The agent evaluates each run using a validation metric called bits-per-byte and iteratively modifies the training script to improve results.

Developers on social media showered praise on the project, given how it demonstrates how AI agents could automate parts of the machine learning experimentation process by continuously generating and testing new model configurations.

Among those reacting publicly was Tobi Lütke, CEO of Shopify, who said he used the system overnight to run experiments on a query-expansion model. Lütke wrote that he woke up to “+19% score on a 0.8b model after 8 hours and 37 experiments,” describing improvements discovered through the automated experiment loop.

He added that watching the system iterate through training adjustments provided unexpected insight into how models improve, writing that he “learned more from that than months of following ML researchers.”

Responding to Lütke, Karpathy said the improvements discovered through the automated experimentation process were already transferring to larger models, noting that changes identified after roughly 650 experiments on a smaller 12-layer model “transfer well to depth 24,” referring to a larger model with 24 transformer layers.

Karpathy said the result suggests that training strategies identified through automated experimentation may scale beyond the smaller models used during the initial runs.

ALSO READ: Big Tech Players Pledge to Pay New Data Centre Costs

Staff Writer
Staff Writer
The AI & Data Insider team works with a staff of in-house writers and industry experts.

Related

Unpack More