Problem: Manual grading of responses does not scale and is inconsistent.
Use case: Systematically assess response quality across large test suites.
Functionality: Grader agents with configurable rubrics, batch βgrade allβ execution, and integration with logs for historical evaluation.
Please authenticate to join the conversation.
Planned
π‘ Feature Requests
About 2 months ago

Aman Sharma
Get notified by email when there are changes.
Planned
π‘ Feature Requests
About 2 months ago

Aman Sharma
Get notified by email when there are changes.