Command-first benchmark

Private AI Lobster Benchmark

Create a task in BestClaw first, then run the command on your own lobster server. Once the lobster finishes, post the result back to our callback endpoint and this page will sync the final evaluation automatically.

This page does not execute your command directly. It creates tasks, exposes callback parameters, polls for results, and turns the external lobster server response into the final report.

1. Create a benchmark task
Enter the command you plan to run on the lobster server. Once created, you will get a task ID, callback endpoint, and a command template.
3. Wait for the official synced result
Create a task on the left first. After that, this area will show the task parameters, command template, and the synced result panel.
Why this flow works
BestClaw does not execute the user command directly. It creates a task, exposes a callback contract, receives the external lobster server result, and renders it with a unified report template.
Command runs externally

BestClaw does not touch the user execution environment. It only issues task metadata, exposes callback parameters, and renders the final result.

Callback token auth

Every task gets a dedicated callback token, and only an external server with the correct token can sync a result back.

Unified report contract

No matter where the lobster runs, as long as it returns the agreed JSON payload, the result is rendered in one consistent report format.