Evaluating with human feedback

Jan 5, 2023

human evaluation

One major challenge in building products to automate more complex, nuanced tasks is that defining “good” and “bad” performance becomes a major bottleneck. While you may be able to define a few examples of “bad” behavior (e.g. AVs should try and avoid collisions, customer service chatbots should not reference stale information, medical chatbots should not misdiagnose patients), most of the interactions users will have with the system will be much more subjective.

Introducing the Human Evaluation feature, which enables AI teams to define their evaluation criteria and collect feedback from human annotators.

This functionality can be utilized for data labeling purposes, as well as for preparing datasets for fine-tuning and creating testing sets for future use.

Step-by-step guide