Pearson Correlation Coefficient
Pearson's r measures the strength and direction of the linear relationship between two continuous variables, ranging from -1 (perfect negative) to +1 (perfect positive), with 0 meaning no linear relationship.
Formula
r = [nΣxy - ΣxΣy] / √{[nΣx² - (Σx)²][nΣy² - (Σy)²]}
Or equivalently:
r = Σ[(xᵢ-x̄)(yᵢ-ȳ)] / √[Σ(xᵢ-x̄)² × Σ(yᵢ-ȳ)²]
Interpretation Scale
|r| = 0.90–1.00: Very strong
|r| = 0.70–0.89: Strong
|r| = 0.50–0.69: Moderate
|r| = 0.30–0.49: Weak
|r| = 0.00–0.29: Negligible
R² — Coefficient of Determination
R² = r²
r = 0.8 → R² = 0.64
Interpretation: 64% of the variation in Y is
explained by the linear relationship with X
Critical Warnings
- Correlation ≠ causation (spurious correlations exist)
- Sensitive to outliers — one point can change r dramatically
- Only measures linear relationships (misses curves)
- Test significance: |t| = r√(n-2)/√(1-r²), df = n-2
Calculate correlation: Free Correlation Coefficient Calculator
Correlation Coefficient Quick-Reference Table
| r value | Strength | Direction | Example |
|---|---|---|---|
| +0.90 to +1.00 | Very strong | Positive | Height vs. arm span |
| +0.70 to +0.89 | Strong | Positive | Study hours vs. grade |
| +0.40 to +0.69 | Moderate | Positive | Exercise vs. fitness |
| +0.10 to +0.39 | Weak | Positive | Income vs. happiness |
| −0.10 to +0.10 | Negligible | None | Shoe size vs. IQ |
| −0.40 to −0.69 | Moderate | Negative | Stress vs. sleep quality |
| −0.90 to −1.00 | Very strong | Negative | Altitude vs. oxygen level |
How the Pearson Correlation Coefficient Works
Pearson's r = Σ[(xᵢ−x̄)(yᵢ−ȳ)] / [√Σ(xᵢ−x̄)² × √Σ(yᵢ−ȳ)²]. It measures the strength and direction of the linear relationship between two continuous variables, ranging from −1 (perfect negative linear) through 0 (no linear relationship) to +1 (perfect positive linear). r² (coefficient of determination) is the proportion of variance in y explained by x in simple linear regression.
Spearman's rank correlation (ρ) is a non-parametric alternative that uses ranks instead of values — appropriate for ordinal data, non-linear monotonic relationships, or when outliers would distort Pearson's r. Point-biserial correlation handles one dichotomous variable. For categorical variables, use Cramér's V or phi coefficient.
Common Mistakes
- Assuming causation from correlation: Countries with more TVs per capita have higher life expectancy — but TVs don't cause longevity. Both are driven by wealth. Always consider confounding variables before inferring causation.
- Using Pearson r for non-linear relationships: r measures only linear association. Two perfectly related variables following a U-curve may give r = 0. Always plot a scatter graph first.
- Ignoring statistical significance: A correlation of r = 0.8 is not necessarily significant if n = 3 (df = 1). Test H₀: ρ = 0 using t = r√(n−2)/√(1−r²) with df = n−2. With n = 10, r = 0.8 gives t = 3.77 (p < 0.01).
Frequently Asked Questions
A minimum of n = 30 is often cited for Pearson's r to be approximately normally distributed. For detecting a moderate correlation (r = 0.3) with 80% power at α = 0.05, you need n ≈ 84. Small samples produce unstable r estimates with very wide confidence intervals — an r = 0.5 from n = 10 has a 95% CI of roughly [−0.07, 0.87], almost useless for interpretation.
Pearson r measures linear relationship and requires interval/ratio data with approximate normality. Spearman ρ converts data to ranks and measures monotonic (but not necessarily linear) association — valid for ordinal data and robust to outliers. Use Spearman when data are ordinal (Likert scales, rankings), when outliers are present, or when the relationship appears curved but monotonic on a scatter plot.
Portfolio variance = w₁²σ₁² + w₂²σ₂² + 2w₁w₂σ₁σ₂ρ₁₂. When ρ < 1, combining assets reduces portfolio volatility — the core of diversification theory. Perfect negative correlation (ρ = −1) would theoretically eliminate all risk. In practice, asset correlations increase during market crises (flight to safety), reducing diversification benefits precisely when they are most needed.