Multi-class scatterplots are essential for visually comparing data, such as examining class distributions in dimensionality reduction and evaluating classification models. Visual class separation (VCS) measures quantify human perception but are largely derived from and evaluated with datasets reflecting limited types of scatterplot features (e.g., data distribution, similar class densities). Quantitatively identifying which scatterplot features are influential to VCS tasks can enable more robust guidance for future measures. We analyze the alignment between VCS measures and people's perceptions of class separation through a crowdsourced study using 70 scatterplot features relevant to class separation. To cover a wide range of scatterplot features, we generated a set of multi-class scatterplots from 6,947 real-world datasets. Our results highlight that multiple combinations of features are needed to best explain VCS. From our analysis, we develop a composite feature model that identifies key scatterplot features for measuring VCS task performance.
Multi-class scatterplots are essential for visually comparing data, such as examining class distributions in dimensionality reduction and evaluating classification models. Visual class separation (VCS) measures quantify human perception but are largely derived from and evaluated with datasets reflecting limited types of scatterplot features (e.g., data distribution, similar class densities). Quantitatively identifying which scatterplot features are influential to VCS tasks can enable more robust guidance for future measures. We analyze the alignment between VCS measures and people's perceptions of class separation through a crowdsourced study using 70 scatterplot features relevant to class separation. To cover a wide range of scatterplot features, we generated a set of multi-class scatterplots from 6,947 real-world datasets. Our results highlight that multiple combinations of features are needed to best explain VCS. From our analysis, we develop a composite feature model that identifies key scatterplot features for measuring VCS task performance.