Logging

This guide explains what logging is, why it's useful to log data, and what logs look like in Braintrust. Before proceeding, make sure to read the quickstart guide and setup an API key.

Logging Screenshot

What are logs?

Logs are the recorded data and metadata from an AI routine. We record the inputs and outputs of your LLM calls on our platform to help you understand how your model performs against a set of predefined tasks, identify patterns, and diagnose issues.

In Braintrust, logs consist of traces, which roughly correspond to a single request or interaction in your application. Traces consist of one or more spans, each of which corresponds to a unit of work in your application (e.g. an LLM call). You usually collect logs as you run your application, both internally (staging) and externally (production), and utilize them to debug issues, track user behavior, and collect data into datasets.

See the tracing guide for more details on how to trace your code in Braintrust.

Why log in Braintrust?

By design, logs are exactly the same data structure as Experiments. This leads to a number of useful properties:

  • If you instrument your code to run evals, you can reuse this instrumentation to generate logs
  • Your logged traces capture exactly the same data as your evals
  • You can reuse automated and human review scores across both experiments and logs

The killer insight here is that if you use Braintrust to both run evals and log traces, you are automatically recording data in exactly the right format to evaluate with it. This enables you to build a feedback loop between what you're observing in the real world and what you evaluate offline, which is one of, if not the, most important aspects of building high quality AI applications.

Writing logs

To log to Braintrust, simply wrap the code you wish to log. Braintrust will automatically capture and log information behind the scenes.

import { initLogger, wrapOpenAI, wrapTraced } from "braintrust";
import OpenAI from "openai";
 
// You just need to initialize this, and `wrapTraced` will automatically log to it.
// In more advanced cases (see below), you can initialize spans directly from the logger.
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
const client = wrapOpenAI(
  new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  }),
);
 
const someLLMFunction = wrapTraced(async function someLLMFunction(
  input: string,
) {
  return client.chat.completions.create({
    messages: [
      {
        role: "system",
        content: "Classify the following text as a question or a statement.",
      },
      {
        role: "user",
        content: input,
      },
    ],
    model: "gpt-4o",
  });
});
 
export async function POST(req: Request) {
  return await someLLMFunction(await req.text());
}

For full details, refer to the tracing guide, which describes how to log traces to braintrust.

Viewing logs

To view logs, navigate to the Logs tab in the appropriate project in the Braintrust UI. Logs are automatically updated in real-time as new traces are logged.

You can filter logs by tags, time range, and arbitrary subfields using Braintrust Query Language syntax. Here are a few examples of common filters:

DescriptionSyntax
Logs older than the past daycreated < CURRENT_DATE - INTERVAL 1 DAY
Logs with a user_id field equal to 1234metadata.user_id = '1234'
Logs with a Factuality score greater than 0.5scores.Factuality > 0.5

Monitor page

To monitor your logs select Go to monitor on the right-hand side of the Logs page. This page shows aggregate values for latency, token count, time to first token, cost, and scores for logs.

Monitor page

Select the Group by dropdown menu to group the data shown using metadata fields.

Monitor page with group by

Querying through the API

For basic filters and access to the logs, you can use the project logs endpoint. This endpoint supports the same query syntax as the UI, and also allows you to specify additional fields to return.

For more advanced queries, you can use BTQL endpoint.

User feedback

Braintrust supports logging user feedback, which can take multiple forms:

  • A score for a specific span, e.g. the output of a request could be 👍 (corresponding to 1) or 👎 (corresponding to 0), or a document retrieved in a vector search might be marked as relevant or irrelevant on a scale of 0->1.
  • An expected value, which gets saved in the expected field of a span, alongside input and output. This is a great place to store corrections.
  • A comment, which is a free-form text field that can be used to provide additional context.
  • Additional metadata fields, which allow you to track information about the feedback, like the user_id or session_id.

Each time you submit feedback, you can specify one or more of these fields using the logFeedback() / log_feedback() method, which simply needs you to specify the span_id corresponding to the span you want to log feedback for, and the feedback fields you want to update.

The following example shows how to log feedback within a simple API endpoint.

import { initLogger, wrapOpenAI, wrapTraced } from "braintrust";
import OpenAI from "openai";
 
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
const client = wrapOpenAI(
  new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  }),
);
 
const someLLMFunction = wrapTraced(async function someLLMFunction(
  input: string,
) {
  return client.chat.completions.create({
    messages: [
      {
        role: "system",
        content: "Classify the following text as a question or a statement.",
      },
      {
        role: "user",
        content: input,
      },
    ],
    model: "gpt-4o",
  });
});
 
export async function POST(req: Request) {
  return logger.traced(async (span) => {
    const text = await req.text();
    const result = await someLLMFunction(text);
    span.log({ input: text, output: result });
    return {
      result,
      requestId: span.id,
    };
  });
}
 
// Assumes that the request is a JSON object with the requestId generated
// by the previous POST request, along with additional parameters like
// score (should be 1 for thumbs up and 0 for thumbs down), comment, and userId.
export async function POSTFeedback(req: Request) {
  const body = await req.json();
  logger.logFeedback({
    id: body.requestId,
    scores: {
      correctness: body.score,
    },
    comment: body.comment,
    metadata: {
      user_id: body.userId,
    },
  });
}

Collecting multiple scores

Often, you want to collect multiple scores for a single span. For example, multiple users might provide independent feedback on a single document. Although each score and expected value is logged separately, each update overwrites the previous value. Instead, to capture multiple scores, you should create a new span for each submission, and log the score in the scores field. When you view and use the trace, Braintrust will automatically average the scores for you in the parent span(s).

import { initLogger, wrapOpenAI, wrapTraced } from "braintrust";
import OpenAI from "openai";
 
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
const client = wrapOpenAI(
  new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  }),
);
 
const someLLMFunction = wrapTraced(async function someLLMFunction(
  input: string,
) {
  return client.chat.completions.create({
    messages: [
      {
        role: "system",
        content: "Classify the following text as a question or a statement.",
      },
      {
        role: "user",
        content: input,
      },
    ],
    model: "gpt-4o",
  });
});
 
export async function POST(req: Request) {
  return logger.traced(async (span) => {
    const text = await req.text();
    const result = await someLLMFunction(text);
    span.log({ input: text, output: result });
    return {
      result,
      requestId: span.export(),
    };
  });
}
 
export async function POSTFeedback(req: Request) {
  const body = await req.json();
  logger.traced(
    async (span) => {
      logger.logFeedback({
        id: span.id, // Use the newly created span's id, instead of the original request's id
        comment: body.comment,
        scores: {
          correctness: body.score,
        },
        metadata: {
          user_id: body.userId,
        },
      });
    },
    {
      parent: body.requestId,
      name: "feedback",
    },
  );
}

Tags

Braintrust supports curating logs by adding tags, and then filtering on them in the UI. Tags naturally flow between logs, to datasets, and even to experiments, so you can use them to track various kinds of data across your application, and track how they change over time.

Configuring tags

Tags are configured at the project level, and in addition to a name, you can also specify a color and description. To configure tags, navigate to the Configuration tab in a project, where you can add, modify, and delete tags.

Configure tags

Adding tags in the SDK

You can also add tags to logs using the SDK. To do so, simply specify the tags field when you log data.

import { wrapOpenAI, initLogger } from "braintrust";
import { OpenAI } from "openai";
 
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
const client = wrapOpenAI(new OpenAI({ apiKey: process.env.OPENAI_API_KEY }));
 
export async function POST(req: Request) {
  return logger.traced(async (span) => {
    const input = await req.text();
    const result = await client.chat.completions.create({
      model: "gpt-3.5-turbo",
      messages: [{ role: "user", content: input }],
    });
    span.log({ input, output: result, tags: ["user-action"] });
    return {
      result,
      requestId: span.id,
    };
  });
}

Tags can only be applied to top-level spans, e.g those created via traced() or logger.startSpan()/ logger.start_span(). You cannot apply tags to subspans (those created from another span), because they are properties of the whole trace, not individual spans.

You can also apply tags while capturing feedback via the logFeedback() / log_feedback() method.

import { initLogger } from "braintrust";
 
const logger = initLogger({
  projectName: "My project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
export async function POSTFeedback(req: Request) {
  const { spanId, comment, score, userId } = await req.json();
  logger.logFeedback({
    id: spanId, // Use the newly created span's id, instead of the original request's id
    comment,
    scores: {
      correctness: score,
    },
    metadata: {
      user_id: userId,
    },
    tags: ["user-feedback"],
  });
}

Filtering by tags

To filter by tags, simply select the tags you want to filter by in the UI.

Online evaluation

Although you can log scores from your application, it can be awkward and computationally intensive to run evals code in your production environment. To solve this, Braintrust supports server-side online evaluations that are automatically run asynchronously as you upload logs. You can pick from the pre-built autoevals functions or your custom scorers, and define a sampling rate along with more granular filters to control which logs get evaluated.

Configuring online evaluation

To create an online evaluation, navigate to the Configuration tab in a project and create an online scoring rule.

The score will now automatically run at the specified sampling rate for all logs in the project.

Defining custom scoring logic

In addition to the pre-built autoevals, you can define your own custom scoring logic by creating custom scorers. Currently, you can do that by visiting the Playground and creating custom scorers.

Logging multiple projects

The first logger you initialize in your program becomes the "current" (default) logger. Any subsequent traced function calls will use the current logger. If you'd like to log to multiple projects, you will need to create multiple loggers, in which case setting just one as the current leads to unexpected behavior.

When you initialize a logger, you can specify not to set it as the current logger:

import { initLogger } from "braintrust";
 
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
  setCurrent: false,
});

Caching loggers

When you initialize a logger, it performs some background work to (a) login to Braintrust if you haven't already, and (b) fetch project metadata. This background work does not block your code; however, if you initialize a logger on each request, it will slow down logging performance quite a bit. Instead, it's a best practice to cache these loggers and reuse them:

import { initLogger, Logger } from "braintrust";
 
// See docs below for more information on setting the async flush flag to true or false
const loggers = new Map<string, Logger<true>>();
 
function getLogger(projectName: string): Logger<true> {
  if (!loggers.has(projectName)) {
    loggers.set(
      projectName,
      initLogger({
        projectName,
        apiKey: process.env.BRAINTRUST_API_KEY,
        setCurrent: false,
        asyncFlush: true,
      }),
    );
  }
  return loggers.get(projectName)!;
}

Initializing login

Last, but not least, the logger lazily authorizes against Braintrust when it is first used. This information is shared across loggers, but you may want to explicitly call login() once to avoid having to pass in an API key to each logger (or to use the BRAINTRUST_API_KEY environment variable).

There is a lower-level mechanism which can even let you use different API keys for different loggers, but it's not documented or officially supported. Get in touch if you need this.

import { login } from "braintrust";
 
// Run this function once at the beginning of your application
async function init() {
  await login({
    apiKey: process.env.BRAINTRUST_API_KEY,
  });
}

Implementation considerations

Data model

  • Each log entry is associated with an organization and a project. If you do not specify a project name or id in initLogger()/init_logger(), the SDK will create and use a project named "Global".
  • Although logs are associated with a single project, you can still use them in evaluations or datasets that belong to any project.
  • Like evaluation experiments, log entries contain optional input, output, expected, scores, metadata, and metrics fields. These fields are optional, but we encourage you to use them to provide context to your logs.
  • Logs are indexed automatically to enable efficient search. When you load logs, Braintrust automatically returns the most recently updated log entries first. You can also search by arbitrary subfields, e.g. metadata.user_id = '1234'. Currently, inequality filters, e.g. scores.accuracy > 0.5 do not use an index.

Production vs. staging

There are a few ways to handle production vs. staging data. The most common pattern we see is to split them into different projects, so that they are separated and code changes to staging cannot affect production. Separating projects also allows you to enforce access controls at the project level.

Alternatively, if it's easier to keep things in one project (e.g. to have a single spot to triage them), you can use tags to separate them. If you need to physically isolate production and staging, you can create separate organizations, each mapping to a different deployment.

Experiments, prompts, and playgrounds can all use data across projects. For example, if you want to reference a prompt from your production project in your staging logs, or evaluate using a dataset from staging in a different project, you can do so.

Initializing

The initLogger()/init_logger() method initializes the logger. Unlike the experiment init() method, the logger lazily initializes itself, so that you can call initLogger()/init_logger() at the top of your file (in module scope). The first time you log() or start a span, the logger will log into Braintrust and retrieve/initialize project details.

Flushing

The SDK can operate in two modes: either it sends log statements to the server after each request, or it buffers them in memory and sends them over in batches. Batching reduces the number of network requests and makes the log() command as fast as possible. Each SDK flushes logs to the server as fast as possible, and attempts to flush any outstanding logs when the program terminates.

You can enable background batching by setting the asyncFlush / async_flush flag to true in initLogger()/init_logger(). When async flush mode is on, you can use the .flush() method to manually flush any outstanding logs to the server.

import { initLogger } from "braintrust";
 
// In the JS SDK, `asyncFlush` is false by default.
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
  asyncFlush: true,
});
 
// ... Your application logic ...
 
// Some function that is called while cleaning up resources
async function cleanup() {
  await logger.flush();
}

Serverless environments

The asyncFlush / async_flush flag controls whether or not logs are flushed when a trace completes. This flag should be set to false in serverless environments (other than Vercel) where the process may halt as soon as the request completes. By default, asyncFlush is set to false in the TypeScript SDK, since most TypeScript applications are serverless, and True in Python.

import { initLogger } from "braintrust";
 
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
  asyncFlush: false,
});

Vercel

Braintrust automatically utilizes Vercel's waitUntil functionality if it's available, so you can set asyncFlush: true in Vercel and your requests will not need to block on logging.

On this page