Comparison (green is better)
|
Median |
Mean |
|
|---|---|---|
|
Sensitivity to outliers |
The median is much more robust to outliers than mean (see [1],[2]) |
|
|
Artificiality |
The median always yields a "realistic" value (that is present at least once in the dataset) |
The mean can be completely artificial (for instance, with two groups of values, the mean may be in the middle, or with integer values, the mean is often a real) |
|
Representativeness |
The median often gives a better intuition of the underlying data, esp. for exponential and long-tail distributions. It's the best choice when one has no idea of the underlying distribution of the data. |
The mean is mostly appropriate for symetric distributions such as the normal (gaussian) distribution. |
|
Manual computation |
Requires ordering the whole data |
The mean is much easier to compute (N additions + 1 Division) |
|
Implementation |
With a good API, this is also easy, e.g. in Java:Arrays.sort(data); return data[data.length/2]; |
Much simpler with basic arithmetic operators:
sum = 0;
for(i=0; i<data.length; i++) {sum = sum + data[i];}
return sum/data.length;
Can also be computed "on-line", without storing all values |
|
Precision |
For certain distributions, the mean has a lower variation, i.e. a better statistical "efficiency" (see [2],[3]) |
|
|
Partial Information |
In certain cases, the theoretical median can be computed by knowing only half of the whole distribution. The median is n such that P(k<n) is very close to 0.5. It is exactly computable, even if the right most tail is infinite, or even completely irregular. |
Those characteristics have an an impact on the algorithms working upon media or mean. For instance, with respect to clustering, K-medoids [using median] is less susceptible to local minima than standard k-means during training where k-means converges often to poor quality clusters. [4]
[1] Chapter "Mean versus Median", in "Statistics for the Life Sciences" (Myra L. Samuels, Jeffrey A. Witmer)
[2] Comp Basic: Mean vs. Median (Eliza Polly)
[3] Example of efficiency for mean vs. median (John D. Cook)
[4] A survey of outlier detection methodologies (Hodge, V., & Austin, J.), Artificial Intelligence Review, 2004.