Reference/API/Evals
POST
/v1/eval

Launch an eval

Launch an evaluation. This is the API-equivalent of the Eval function that is built into the Braintrust SDK. In the Eval API, you provide pointers to a dataset, task function, and scoring functions. The API will then run the evaluation, create an experiment, and return the results along with a link to the experiment. To learn more about evals, see the Evals guide.

/v1/eval

The Authorization access token

Authorization

Authorization
Required
Bearer <token>

Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key] to your HTTP request. You can create an API key in the Braintrust organization settings page.

In: header

Request Body

application/jsonRequired

Eval launch parameters

project_id
Required
string

Unique identifier for the project to run the eval in

data
Required
Any properties in dataset_id, project_dataset_name

The dataset to use

task
Required
Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt

The function to evaluate

scores
Required
array<Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt & unknown>

The functions to score the eval on

experiment_namestring

An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.

metadataobject

Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.

streamboolean

Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.

trial_countnumber | null

The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.

is_publicboolean | null

Whether the experiment should be public. Defaults to false.

timeoutnumber | null

The maximum duration, in milliseconds, to run the evaluation. Defaults to undefined, in which case there is no timeout.

max_concurrencynumber | null

The maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.

base_experiment_namestring | null

An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

base_experiment_idstring | null

An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment.

git_metadata_settingsobject | null

Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.

repo_infoobject | null & unknown

curl -X POST "https://api.braintrust.dev/v1/eval" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "string",
    "data": {
      "dataset_id": "string"
    },
    "task": {
      "function_id": "string",
      "version": "string"
    },
    "scores": [
      {
        "function_id": "string",
        "version": "string"
      }
    ],
    "experiment_name": "string",
    "metadata": {
      "property1": null,
      "property2": null
    },
    "stream": true,
    "trial_count": 0,
    "is_public": true,
    "timeout": 0,
    "max_concurrency": 0,
    "base_experiment_name": "string",
    "base_experiment_id": "string",
    "git_metadata_settings": {
      "collect": "all",
      "fields": [
        "commit"
      ]
    },
    "repo_info": {
      "commit": "string",
      "branch": "string",
      "tag": "string",
      "dirty": true,
      "author_name": "string",
      "author_email": "string",
      "commit_message": "string",
      "commit_time": "string",
      "git_diff": "string"
    }
  }'

Eval launch response

{
  "project_name": "string",
  "experiment_name": "string",
  "project_url": "http://example.com",
  "experiment_url": "http://example.com",
  "comparison_experiment_name": "string",
  "scores": {
    "property1": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "score": 1,
      "diff": -1,
      "improvements": 0,
      "regressions": 0
    }
  },
  "metrics": {
    "property1": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    },
    "property2": {
      "name": "string",
      "metric": 0,
      "unit": "string",
      "diff": 0,
      "improvements": 0,
      "regressions": 0
    }
  }
}

On this page