Linear Regression Calculator

Least-squares linear regression finds the line y = mx + b that minimizes the sum of squared vertical distances from each data point to the line. The slope m = (nΣxy - ΣxΣy) / (nΣx² - (Σx)²) and intercept b = ȳ - mx̄. For the dataset x = [1, 2, 3, 4, 5] and y = [2, 4, 5, 4, 5], the regression line is y = 0.6x + 2.2 with R² = 0.36. Enter your X and Y values below to compute the equation, R², and optional predictions.

Quick Answer

For x = [1, 2, 3, 4, 5] and y = [2, 4, 5, 4, 5], the best-fit line is y = 0.6x + 2.2 with R² = 0.36. For a perfect linear dataset like x = [1, 2, 3] and y = [2, 4, 6], the line is y = 2x + 0 with R² = 1.00.

One per line or comma-separated

Must match the number of X values

Common Examples

Input Result
X = [1, 2, 3, 4, 5], Y = [2, 4, 5, 4, 5] y = 0.6x + 2.2, R² = 0.36
X = [1, 2, 3], Y = [2, 4, 6] y = 2x + 0, R² = 1.00
X = [10, 20, 30, 40], Y = [15, 25, 35, 45] y = 1x + 5, R² = 1.00
X = [1, 2, 3], Y = [6, 4, 2] y = -2x + 8, R² = 1.00

How It Works

The least-squares formulas

Given n paired observations (x₁, y₁), (x₂, y₂), …, (xₙ, yₙ), the slope and intercept of the best-fit line are:

\[m = \frac{n\sum x_i y_i - \sum x_i \sum y_i}{n\sum x_i^2 - \left(\sum x_i\right)^2}\] \[b = \bar{y} - m\bar{x}\]

where \(\bar{x}\) and \(\bar{y}\) are the sample means. The formulas minimize the sum of squared residuals \(\sum (y_i - \hat{y}_i)^2\), which is why this method is called least squares.

R² (coefficient of determination)

R² measures how much of the variation in y is explained by the linear relationship with x:

\[R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}\]

An R² of 1 means the line fits the data perfectly. An R² of 0 means the line explains none of the variation in y. Values above 0.9 indicate a strong fit; values below 0.4 indicate a weak linear relationship.

Pearson r vs R²

Pearson r (the correlation coefficient) measures the direction and strength of the linear relationship, ranging from -1 to +1. R² equals r² for simple linear regression, so R² is always non-negative. A negative slope produces a negative r even though R² is positive.

Worked example

For the dataset x = [1, 2, 3, 4, 5] and y = [2, 4, 5, 4, 5]:

  • n = 5, Σx = 15, Σy = 20, Σxy = 63, Σx² = 55
  • m = (5 × 63 - 15 × 20) / (5 × 55 - 15²) = (315 - 300) / (275 - 225) = 15 / 50 = 0.6
  • b = (20/5) - 0.6 × (15/5) = 4 - 1.8 = 2.2
  • Equation: y = 0.6x + 2.2, R² = 0.36

When to use linear regression

Linear regression is appropriate when the relationship between x and y is roughly linear, residuals are approximately normally distributed, and the variance of residuals is roughly constant. For curved relationships, consider polynomial regression or a logarithmic transformation before applying a linear model.

Related Calculators

Frequently Asked Questions

What does R² mean?
R² (the coefficient of determination) measures the proportion of variance in the Y values explained by the linear model. An R² of 1.00 means the regression line passes through every data point exactly. An R² of 0.00 means the line explains none of the variation in Y, and a horizontal line at the mean of Y would perform equally well. In practice, R² above 0.9 is considered a strong fit for most applications.
What is the difference between R² and Pearson r?
Pearson r (the correlation coefficient) measures the direction and strength of the linear association between X and Y, ranging from -1 to +1. R² is simply r squared, so it is always between 0 and 1 and carries no sign. For simple linear regression with one predictor, R² = r². The sign of the slope tells you whether the relationship is positive or negative; R² alone does not.
Can linear regression have a negative slope?
Yes. A negative slope means Y tends to decrease as X increases. For example, the dataset X = [1, 2, 3] and Y = [6, 4, 2] produces a slope of -2 with R² = 1.00. The Pearson r in this case is -1, indicating a perfect negative linear relationship.
How many data points do I need?
A minimum of 2 points is required to define a line, but a regression fit on 2 points always produces R² = 1.00 regardless of any actual linear relationship. Reliable regression estimates generally require at least 10 to 20 data points. With very small samples the fit is sensitive to individual observations and the results may not generalize.
What if my data is not linear?
If the scatter plot of X vs Y shows a curve, a linear model will produce a poor fit with low R². Common remedies include transforming X or Y (for example, taking the logarithm of Y for exponential growth), fitting a polynomial regression, or using a different model entirely. Plotting the residuals (actual Y minus predicted Y) against X is a quick diagnostic: a pattern in the residuals confirms that the linear assumption is violated.

Learn More