Production RAG evaluation API
Eval Ninja Cloud Evaluation API
RAG evaluation API with authentication for cloud deployment
Base URL
Choose the API target
The same live requests can target Eval Ninja Cloud or a self-hosted endpoint.
Current target: https://api.eval.ninja
POST
/v1/auth/login Get an access token
Run this first. Evaluation and metrics requests use the returned bearer token from this page session.
No access token saved yet.
Request JSON
{
"grant_type": "client_credentials",
"client_id": "your-organization-id",
"client_secret": "your-api-key",
"scope": "eval:run"
} POST
/v1/evaluate Run an evaluation
Submit one RAG sample and score generation quality, retrieval quality, or both.
Access token required Complete step 1 before running this request.
Metrics
Disabled metrics require fields not included in this sample.
Request JSON
{
"user_input": "How can machine learning models handle class imbalance in fraud detection?",
"response": "Machine learning models can handle class imbalance in fraud detection through several techniques: SMOTE for synthetic data generation, cost-sensitive learning to penalize misclassification of minority classes, ensemble methods like Random Forest with balanced sampling, and threshold tuning to optimize precision-recall trade-offs. Feature engineering and anomaly detection approaches are also effective for identifying rare fraudulent patterns.",
"retrieved_contexts": [
"Class imbalance is a common challenge in fraud detection where fraudulent transactions represent less than 1% of all transactions.",
"SMOTE (Synthetic Minority Oversampling Technique) generates synthetic examples of minority classes to balance datasets.",
"Cost-sensitive learning assigns different misclassification costs to different classes, making models more sensitive to minority class errors."
],
"reference": null,
"metrics": [
"faithfulness",
"answer_relevancy",
"context_precision"
]
} Response shape 200
91.7% Average
3 Metrics
0 Failed
faithfulness95%
answer_relevancy88%
context_precision92%
{
"summary": {
"average_score": 0.92,
"successful_metrics": 3,
"failed_metrics": 0,
"interpretation": "Strong overall performance across selected metrics."
},
"metrics": [
{
"name": "faithfulness",
"score": 0.95
},
{
"name": "answer_relevancy",
"score": 0.88
},
{
"name": "context_precision",
"score": 0.92
}
]
} GET
/v1/metrics List available metrics
Inspect the metrics supported by the configured API target.
Example response 200
{
"metrics": [
"faithfulness",
"answer_relevancy",
"context_precision",
"context_recall",
"answer_correctness",
"context_relevancy",
"answer_similarity",
"harmfulness",
"coherence",
"context_entity_recall",
"rouge_score",
"bleu_score",
"exact_match",
"string_match",
"semantic_similarity",
"aspect_critic"
]
}