Product

Private Beta

Auto Research

67 experiments. 10 wins. 0 humans in the loop.

An autonomous ML experimentation loop: give it a question, let it decide the next experiment, and get back evidence on what actually moves the metric — without a person turning knobs.

Story

The project started from a simple frustration: running ML experiments is slow because humans bottleneck the loop — hypothesis, code, run, evaluate, learn, repeat. Auto Research closes that loop: the agent decides what to try next, writes the code, runs it, evaluates the results, and feeds its learnings into the next iteration. On an H100 it ran 67 experiments in three nights — what would take a human running three a week about four months. The biggest quality jumps came from live web search (world knowledge beats model IQ) and from discovering the answer key was wrong. The system ran at 83% and wouldn't break through until I checked the errors by hand. It's still an early research tool, but the lesson was clear: the scarce resource just moved. It used to be compute, then it was talent, now it's taste — knowing which question is worth answering, whether the data is honest, and whether the result makes sense. This is still a private preview, no public demo or access yet.

Focus

Autonomous ML experimentation and AI-driven research loops for rapid hypothesis testing.

Medium: Agent System

Technical Highlights

  • Zero human intervention between experiments — the agent drives the full hypothesis to code to run to evaluate cycle.
  • 67 experiments completed in three nights on H100 hardware.
  • Git-branch-per-experiment isolation with automatic discard on failure.
  • Real-time loss curve and metrics visualization.
  • The system's biggest improvement came from live web search — world knowledge plateaued where raw model IQ didn't.

Technical Stack

Claude API (autonomous reasoning loop)Python + Colab runtimeGit branch-per-experiment isolationReal-time streaming telemetryPandas results analysis

How to provide media for this page

Add screenshots and demo clips into public/media. Then register assets in data/projects.ts and reference them in this template. For YouTube, include a public link and we can embed it directly.