Production RAG evaluation API
Ragas Cloud Evaluation API
RAG evaluation API with authentication for cloud deployment
Base URL
Switch between cloud or self-hosted
Current target: https://api.eval.ninja
POST
Authentication
OAuth 2.0 client credentials token
/v1/auth/login Generate a bearer token for the protected evaluation endpoints.
Request body JSON
{
"grant_type": "client_credentials",
"client_id": "your-organization-id",
"client_secret": "your-api-key",
"scope": "eval:run"
} POST
Evaluation
Evaluate RAG samples
/v1/evaluate Score one RAG output for faithfulness, answer relevancy, and retrieval quality.
Authentication required. Run the token request first. This page stores the token in memory for live calls.
Metrics
Toggle metrics to include in the request. Dimmed badges need fields not present in the sample above (e.g. reference).
Request body JSON
{
"user_input": "How can machine learning models handle class imbalance in fraud detection?",
"response": "Machine learning models can handle class imbalance in fraud detection through several techniques: SMOTE for synthetic data generation, cost-sensitive learning to penalize misclassification of minority classes, ensemble methods like Random Forest with balanced sampling, and threshold tuning to optimize precision-recall trade-offs. Feature engineering and anomaly detection approaches are also effective for identifying rare fraudulent patterns.",
"retrieved_contexts": [
"Class imbalance is a common challenge in fraud detection where fraudulent transactions represent less than 1% of all transactions.",
"SMOTE (Synthetic Minority Oversampling Technique) generates synthetic examples of minority classes to balance datasets.",
"Cost-sensitive learning assigns different misclassification costs to different classes, making models more sensitive to minority class errors."
],
"reference": null,
"metrics": [
"faithfulness",
"answer_relevancy",
"context_precision"
]
} Example response 200
{
"metrics": [
{
"name": "faithfulness",
"score": 0.95,
"percentage": 95,
"interpretation": "The answer is well grounded in the retrieved context.",
"error": null
},
{
"name": "answer_relevancy",
"score": 0.88,
"percentage": 88,
"interpretation": "The answer addresses the question directly.",
"error": null
},
{
"name": "context_precision",
"score": 0.92,
"percentage": 92,
"interpretation": "Retrieved passages are mostly pertinent to the query.",
"error": null
}
],
"summary": {
"average_score": 0.92,
"average_percentage": 91.7,
"successful_metrics": 3,
"failed_metrics": 0,
"generation_quality": 0.92,
"retrieval_quality": 0.92,
"interpretation": "Strong overall performance across selected metrics.",
"org_id": "your-org-id",
"user_id": "your-user-id"
},
"request": {
"user_input": "How can machine learning models handle class imbalance in fraud detection?",
"response": "Machine learning models can handle class imbalance...",
"reference": null,
"retrieved_contexts": [
"Class imbalance is a common challenge..."
],
"config": null,
"metrics": [
"faithfulness",
"answer_relevancy",
"context_precision"
]
}
} GET
Metrics
List available metrics
/v1/metrics Inspect the metrics supported by the configured API target.
Example response 200
{
"metrics": [
"faithfulness",
"answer_relevancy",
"context_precision",
"context_recall",
"answer_correctness",
"context_relevancy",
"answer_similarity",
"harmfulness",
"coherence",
"context_entity_recall",
"rouge_score",
"bleu_score",
"exact_match",
"string_match",
"semantic_similarity",
"aspect_critic"
]
}