Correlation Calculator

Quick Answer

For x = [1, 2, 3, 4, 5] and y = [2, 4, 5, 4, 5], the Pearson correlation is r = 0.7746 (strong positive). For x = [1, 2, 3] and y = [2, 4, 6], r = 1.0 (perfect positive correlation).

X Values

Separate values with commas, spaces, or newlines

Y Values

Each Y value pairs with the X value in the same position

Common Examples

Input	Result
X = [1, 2, 3], Y = [2, 4, 6]	r = 1.0000 (perfect positive)
X = [1, 2, 3], Y = [6, 4, 2]	r = -1.0000 (perfect negative)
X = [1, 2, 3, 4, 5], Y = [2, 4, 5, 4, 5]	r = 0.7746 (strong positive)
X = [10, 20, 30, 40, 50], Y = [12, 19, 28, 42, 51]	r ≈ 0.9976 (very strong positive)

How It Works

The formula

The Pearson correlation coefficient is defined as:

\[r = \frac{n\sum xy - \sum x \sum y}{\sqrt{\left(n\sum x^2 - \left(\sum x\right)^2\right)\left(n\sum y^2 - \left(\sum y\right)^2\right)}}\]

Where n is the number of paired data points, x and y are the individual values, and the sums run over all pairs.

Range and interpretation

r always falls between -1 and +1:

r = +1: perfect positive linear relationship (as X increases, Y increases proportionally)
r = -1: perfect negative linear relationship (as X increases, Y decreases proportionally)
r = 0: no linear relationship between X and Y

Strength scale

	r	range	Label
0.9 to 1.0	Very strong
0.7 to 0.9	Strong
0.5 to 0.7	Moderate
0.3 to 0.5	Weak
0.0 to 0.3	Very weak or none

R² (coefficient of determination)

R² is the square of r. It represents the proportion of variance in Y that is explained by X. For example, r = 0.8 gives R² = 0.64, meaning X accounts for 64% of the variation in Y. The remaining 36% comes from other factors.

Correlation vs. causation

A high r value shows that two variables move together, but it does not establish that one causes the other. A third variable may drive both, or the relationship may be coincidental. Correlation is a measure of association, not causation.

Worked example

For x = [1, 2, 3, 4, 5] and y = [2, 4, 5, 4, 5]:

n = 5, Σx = 15, Σy = 20, Σxy = 1×2 + 2×4 + 3×5 + 4×4 + 5×5 = 66, Σx² = 55, Σy² = 86

Numerator: 5 × 66 − 15 × 20 = 330 − 300 = 30

Denominator: √[(5 × 55 − 225)(5 × 86 − 400)] = √[(275 − 225)(430 − 400)] = √[50 × 30] = √1500 ≈ 38.73

r = 30 / 38.73 ≈ 0.7746, R² ≈ 0.5999

Related Calculators

Standard Deviation Calculator

Calculate the standard deviation, variance, mean, and other statistics for any data set.

Mean, Median, Mode Calculator

Calculate the mean, median, mode, range, and other basic statistics for any data set.

Frequently Asked Questions

What does r = 0 mean?

r = 0 means there is no linear relationship between the two variables. The data points show no consistent upward or downward trend when plotted. Note that r = 0 does not rule out a non-linear relationship; for example, a perfect U-shape curve would still give r close to 0.

What is the difference between r and R²?

r (the correlation coefficient) measures the direction and strength of the linear relationship, ranging from -1 to +1. R² is r squared, ranging from 0 to 1, and represents the proportion of variance in Y explained by X. For example, r = 0.7 gives R² = 0.49, meaning X explains 49% of the variation in Y. R² is always positive regardless of whether the correlation is positive or negative.

Does correlation prove causation?

No. Correlation only shows that two variables tend to move together. A causal link requires additional evidence, such as a controlled experiment or a well-reasoned mechanism. Classic examples of spurious correlations include ice cream sales and drowning rates (both rise in summer due to heat) and the number of Nicolas Cage films released and pool drowning deaths.

Can Pearson r detect non-linear relationships?

No. Pearson r measures only linear association. A strong curved relationship (such as a quadratic or exponential pattern) can produce an r near 0 even when the variables are clearly related. Always plot your data before interpreting r. If the scatterplot shows a curve, Spearman's rank correlation or a non-linear model may be more appropriate.

How many data points do I need for a reliable result?

With fewer than 10 paired observations, r is highly sensitive to individual values and the margin of error is large. As a general rule, 30 or more data points produce a more stable estimate. The significance of a given r value also depends on sample size: r = 0.5 may be statistically significant with n = 20 but not with n = 5. Use a t-test or p-value calculation to assess statistical significance.