LangSmith
Observability and evals for LLM applications.
LangSmith is a platform for tracing, debugging, evaluating, and monitoring LLM apps. It works with LangChain, LangGraph, and any framework via the OpenTelemetry-compatible SDK — capturing every prompt, tool call, and token usage.
Install
pip install langsmithnpm install langsmithQuickstart
A minimal example to verify your setup.
import os
from langsmith import traceable
from openai import OpenAI
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "<your-key>"
client = OpenAI()
@traceable
def answer(question: str) -> str:
res = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": question}],
)
return res.choices[0].message.content
print(answer("What is observability?"))Core concepts
Tracing
Capture nested runs with inputs, outputs, latency, token counts, and errors. Works automatically with LangChain and via @traceable for any code.
Datasets & evals
Build evaluation datasets from production traces. Run LLM-as-judge, heuristic, and pairwise evaluators on every commit.
Prompt hub
Version, share, and A/B test prompts across teams. Pull prompts from code with a single SDK call.
Monitoring
Dashboards for latency, cost, error rates, and user feedback. Set alerts on regressions and drift.
Common use cases
- ›Debugging failed agent runs
- ›Regression testing prompt and model changes
- ›Tracking cost and latency in production
- ›Collecting human feedback at scale