🔍
Tracing
Prompt-to-response chains
🧪
Evaluation
Quality & correctness evals
📊
Metrics
Cost, latency, token usage
🛡️
Guardrails
Safety & compliance checks

11 Featured Tools

01 Full Platform
Langfuse

LLM Engineering Platform — traces, evals, prompt management, and metrics to debug and improve your LLM application at every stage of development.

Visit →
02 Enterprise
Dynatrace LLM Observability

Monitor, optimize, and secure Generative AI applications, LLMs, and agentic workflows with enterprise-grade observability and AI-powered root cause analysis.

Visit →
03 Enterprise
Datadog LLM Observability

Develop, evaluate, and monitor LLM applications with confidence — unified with Datadog's broader infrastructure observability for full-stack visibility.

Visit →
04 Full Platform
Opik by Comet

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with tracing, eval metrics, and production-ready dashboards.

Visit →
05 Tracing
Traceloop / OpenLLMetry

Monitors what your model says, how fast it responds, and when things start to slip — built on OpenTelemetry for vendor-agnostic LLM tracing.

Visit →
06 Evaluation
DeepEval

The LLM Evaluation Framework — a pytest-like testing suite for LLMs with 14+ built-in metrics covering hallucination, bias, toxicity, and RAG quality.

Visit →
07 AI Gateway
Portkey

Equips AI teams with everything needed for production — Gateway, Observability, Guardrails, Governance, and Prompt Management all in one unified platform.

Visit →
08 Enterprise
Elastic LLM Observability

Detect risks, resolve issues, and keep your agentic and generative AI applications production-ready — powered by Elastic's search and analytics engine.

Visit →
09 Open Source
Phoenix by Arize

Open-source LLM tracing, evaluation, and observability built on OpenTelemetry — fully agnostic of vendor, framework, and programming language.

Visit →
10 Open Source
Helicone

An open-source platform for monitoring, debugging, and improving LLM applications — with one-line integration, caching, rate limiting, and cost tracking.

Visit →
11 Real-Time
Honeycomb

Granular insight into how your LLMs behave in production — troubleshoot failures faster and continuously improve model performance with real data in real-time.

Visit →