Project

Forkbench

Compare LLMs side-by-side with real-time metrics.

Back to Portfolio

Forkbench is a developer-focused platform for testing and comparing multiple large language models in real time. Run the same prompt across providers, review outputs side-by-side, and track performance metrics like latency, token usage, and cost in one clean interface designed for rapid experimentation.

Key Features

Compare outputs, speed, and cost in one view.

Side-by-Side Comparison

Run one prompt across models and compare quality, tone, and consistency.

Performance Metrics

Track latency, throughput, and cost to choose the right model.

Multi-Provider Support

Query OpenAI, Anthropic, Google, and more from one workflow.

How Forkbench Works

Three steps from prompt to decision.

1

Enter Your Prompt

Write one prompt to benchmark across models.

2

Select Models

Pick the models you want to test.

3

Compare Results

Review outputs with latency, token, and cost metrics.

Supported Models

OpenAI

  • GPT-4o
  • GPT-4o Mini
  • GPT-4 Turbo

Anthropic

  • Claude 3.5 Sonnet
  • Claude 3 Opus
  • Claude 3 Haiku

Google

  • Gemini 2.0 Flash
  • Gemini 1.5 Flash

Others

  • DeepSeek Chat
  • Sherlock Think Alpha

Ready to Compare LLMs?

Run faster experiments and choose models with confidence.