Quick Answer
For x = [1, 2, 3, 4, 5] and y = [2, 4, 5, 4, 5], the best-fit line is y = 0.6x + 2.2 with R² = 0.36. For a perfect linear dataset like x = [1, 2, 3] and y = [2, 4, 6], the line is y = 2x + 0 with R² = 1.00.
One per line or comma-separated
Must match the number of X values
Common Examples
| Input | Result |
|---|---|
| X = [1, 2, 3, 4, 5], Y = [2, 4, 5, 4, 5] | y = 0.6x + 2.2, R² = 0.36 |
| X = [1, 2, 3], Y = [2, 4, 6] | y = 2x + 0, R² = 1.00 |
| X = [10, 20, 30, 40], Y = [15, 25, 35, 45] | y = 1x + 5, R² = 1.00 |
| X = [1, 2, 3], Y = [6, 4, 2] | y = -2x + 8, R² = 1.00 |
How It Works
The least-squares formulas
Given n paired observations (x₁, y₁), (x₂, y₂), …, (xₙ, yₙ), the slope and intercept of the best-fit line are:
\[m = \frac{n\sum x_i y_i - \sum x_i \sum y_i}{n\sum x_i^2 - \left(\sum x_i\right)^2}\] \[b = \bar{y} - m\bar{x}\]where \(\bar{x}\) and \(\bar{y}\) are the sample means. The formulas minimize the sum of squared residuals \(\sum (y_i - \hat{y}_i)^2\), which is why this method is called least squares.
R² (coefficient of determination)
R² measures how much of the variation in y is explained by the linear relationship with x:
\[R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}\]An R² of 1 means the line fits the data perfectly. An R² of 0 means the line explains none of the variation in y. Values above 0.9 indicate a strong fit; values below 0.4 indicate a weak linear relationship.
Pearson r vs R²
Pearson r (the correlation coefficient) measures the direction and strength of the linear relationship, ranging from -1 to +1. R² equals r² for simple linear regression, so R² is always non-negative. A negative slope produces a negative r even though R² is positive.
Worked example
For the dataset x = [1, 2, 3, 4, 5] and y = [2, 4, 5, 4, 5]:
- n = 5, Σx = 15, Σy = 20, Σxy = 63, Σx² = 55
- m = (5 × 63 - 15 × 20) / (5 × 55 - 15²) = (315 - 300) / (275 - 225) = 15 / 50 = 0.6
- b = (20/5) - 0.6 × (15/5) = 4 - 1.8 = 2.2
- Equation: y = 0.6x + 2.2, R² = 0.36
When to use linear regression
Linear regression is appropriate when the relationship between x and y is roughly linear, residuals are approximately normally distributed, and the variance of residuals is roughly constant. For curved relationships, consider polynomial regression or a logarithmic transformation before applying a linear model.
CalculateY