Reference

Compare the stack

These tools overlap but aren't competitors. Most production teams use several together.

FeatureLangChainLangGraphLangSmithLlamaIndexHaystackDSPyCrewAIAutoGenPydantic AISemantic Kernel
Primary purposeCompose LLM building blocksOrchestrate stateful agentsObserve, evaluate, monitorConnect LLMs to your dataProduction search & RAG pipelinesOptimize prompts programmaticallyRole-based multi-agent teamsConversational multi-agent systemsType-safe structured agentsEnterprise polyglot AI SDK
Best forRAG, chains, prompt pipelinesMulti-step / multi-agent flowsDebugging & regression testsEnterprise document Q&AHybrid search + RAG at scaleMeasurable prompt iterationSpecialist agent crewsCode-writing agent teamsValidated structured outputs.NET / Java AI integration
RuntimeLibrary (Py & JS)Library + hosted PlatformHosted SaaS + self-hostLibrary (Py & JS)Library (Py)Library (Py)Library (Py)Library (Py & .NET)Library (Py)SDK (.NET, Py, Java)
Plays well withAny model, any vector DBLangChain runnablesLangChain, LlamaIndex, OTelLangChain, agents, LangSmithElasticsearch, Weaviate, QdrantAny LM via LiteLLMLangChain tools, LiteLLMAzure OpenAI, MCP, DockerFastAPI, Logfire, LangSmithAzure AI, OpenAI, Hugging Face

A typical stack

  1. 1. LlamaIndex — ingest and index your private documents.
  2. 2. LangChain — wrap retrievers, models, and tools as composable runnables.
  3. 3. LangGraph — orchestrate the agent loop with state, retries, and human review.
  4. 4. LangSmith — trace every run, score quality, and catch regressions.