Color in Data Visualization: Encoding Information Without Confusion

Color in data visualization serves one primary function that is different from almost all other design contexts: encoding. In data visualization, color is a data channel — it carries quantitative or categorical information that the viewer is expected to decode. This is fundamentally different from brand design, where color creates mood and association, or from typography, where color establishes hierarchy. In visualization, if the color encoding is ambiguous or misread, the viewer gets the wrong information from the data. Quantitative encoding requires perceptually uniform color scales — scales where equal steps in data value produce equal perceived differences in color. RGB linear interpolation is not perceptually uniform: a gradient from red to green through a mid-yellow appears to have an uneven perceptual distribution because human color perception is not uniform across the spectrum. Perceptually uniform scales (OKLCH-interpolated gradients, or professionally designed scales like Viridis, Plasma, and Cividis) ensure that a 10-unit change in value produces an equivalent perceptual change in color throughout the range. This is critical for viewers to correctly estimate values from color alone. Categorical encoding requires color sets where each category color is maximally distinct from adjacent category colors — not just from all other categories, but specifically from the categories it is most likely to appear near in the visualization. Color sets designed for categorical use (Tableau's 10-color set, ColorBrewer's qualitative palettes) are designed to maximize pairwise distinctiveness across all pairs and to remain distinguishable under common types of color vision deficiency. The maximum number of reliably distinguishable categories in a single visualization is approximately 8-10 for an audience with full color vision and fewer for audiences with color vision deficiency. Beyond this, adding more colors creates confusion rather than encoding. The most common mistakes in data visualization color: using rainbow gradients for quantitative data (they are not perceptually uniform and create artificial visual emphasis at certain hue boundaries), using brand colors for data encoding (brand colors are chosen for brand reasons, not for perceptual distinctiveness), using color as the only differentiator for categories in charts that may be printed in black and white or accessed by users with color vision deficiency, and using overly saturated colors that create visual competition between data series rather than enabling comparison. Each of these mistakes stems from applying brand and aesthetic color thinking to a domain where functional encoding accuracy is the primary requirement.