What is Annotation?
TL;DR
The process of adding semantic information and labels to data. A foundational step that determines AI training data quality.
Annotation: Definition & Explanation
Annotation is the general term for the process of adding semantic information and labels to data such as text, images, audio, and video. While often used interchangeably with data labeling, annotation is a broader concept that also encompasses bounding boxes (enclosing objects with rectangles), segmentation (pixel-level region division), and named entity recognition (identifying person names and place names in text). Quality control of annotations is a critical step that can make or break a machine learning project. Measuring inter-annotator agreement, establishing clear guidelines, and implementing review processes are essential. Recently, using LLMs like GPT-4 and Claude for annotation assistance has become increasingly common, enabling both cost reduction and quality improvement.