A Comparison of Median and Mean

by Martin Monperrus
Mean and median are two measures to summarize a data set of N numerical values. However, there are not equivalent. Here is a thorough comparison of their properties. In my opinion, this shows that the median is much better (less assumptions are required), and should be used in preference for scientific analysis.

Comparison (green is better)
Median
Mean
Sensitivity to outliers
The median is much more robust to outliers than mean (see [1],[2])
Artificiality
The median always yields a "realistic" value (that is present at least once in the dataset)
The mean can be completely artificial (for instance, with two groups of values, the mean may be in the middle, or with integer values, the mean is often a real)
Representativeness
The median often gives a better intuition of the underlying data, esp. for exponential and long-tail distributions. It's the best choice when one has no idea of the underlying distribution of the data.
The mean is mostly appropriate for symetric distributions such as the normal (gaussian) distribution.
Online computation
Requires storing all data
Can be computed "on-line" in a cheap manner, with only the sum and the number of observed values
Precision
For certain distributions, the mean has a lower variation, i.e. a better statistical "efficiency" (see [2],[3])
Partial Information
In certain cases, the theoretical median can be computed by knowing only half of the whole distribution. The median is n such that P(k<n) is very close to 0.5. It is exactly computable, even if the right most tail is infinite, or even completely irregular.


Those characteristics have an an impact on the algorithms working upon media or mean. For instance, with respect to clustering, K-medoids [using median] is less susceptible to local minima than standard k-means. [4]


[1] Chapter "Mean versus Median", in "Statistics for the Life Sciences" (Myra L. Samuels, Jeffrey A. Witmer)
[2] Comp Basic: Mean vs. Median (Eliza Polly)
[3] Example of efficiency for mean vs. median (John D. Cook)
[4] A survey of outlier detection methodologies (Hodge, V., & Austin, J.), Artificial Intelligence Review, 2004.
Tagged as: