Polynomial trend equation and forecast
The charts above show polynomial trendlines for the same data with a degree of 2 - the minimum degree of a polynomial trendline in Excel - and 4.
The degree of the polynomial determines the maximum number of extrema - local minima or maxima of the curve:
- A polynomial trendline of the second degree (also known as a Quadratic polynomial trend line) can describe only one maximum or minimum (one hill or valley).
- A polynomial trendline of the third degree (also known as a Cubic polynomial trend line) has one or two extrema.
The maximum degree of the polynomial trendline that can be displayed in an Excel chart is 6:
The degree of the polynomial is specified in the Order field - see more about the polynomial trendline in Excel.
Excel uses the least squares method to fit a polyline for the data points.
For a trend line, two sets of variables are required:
- an independent, predictor, or explanatory variable x (named as known_x's),
- a dependent or response variable y (named as known_y's).
Polynomial trendline equation and formulas
The polynomial trendline is a curved line with the function:
y = an*xn + ... + a3*x3 + a2*x2 + a1*x + b
where:
- an, ..., a3, a2, a1 and b are calculated parameters of the function (also named function coefficients or constants) that describe the relationship between x and y,
- n is the degree of the polynomial (in Excel, the order of a polynomial). Excel works with polynomial orders from 2nd to 6th.
Note: Excel uses a linear model: the polynomial function y = an*xn + ... + a3*x3 + a2*x2 + a1*x + b is transformed to a linear function using a change of variables. After changing the variables to Xi = xi, the calculations are completely similar to the linear case y = an*Xn + ... + a3*X3 + a2*X2 + a1*x + b. Now the variable y does not depend on one variable x in n different degrees but on n independent variables Xi.
Excel offers the following functions to calculate the trend parameters:
Quadratic (2nd order or 2nd degree) polynomial trendline
- Equation: y = a2*x2 + a1*x + b
- a2: = INDEX (LINEST (known_y's, known_x's ^ {1,2}), 1)
- a1: = INDEX (LINEST (known_y's, known_x's ^ {1,2}), 1, 2)
- b: = INDEX (LINEST (known_y's, known_x's ^ {1,2}), 1, 3)
Cubic (3rd order or 3rd degree) polynomial trendline
- Equation: y = a3*x3 + a2*x2 + a1*x + b
- a3: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3}), 1)
- a2: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3}), 1, 2)
- a1: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3}), 1, 3)
- b: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3}), 1, 4)
The 4th order or 4th degree polynomial trendline
- Equation: y = a4*x4 + a3*x3 + a2*x2 + a1*x + b
- a4: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4}), 1)
- a3: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4}), 1, 2)
- a2: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4}), 1, 3)
- a1: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4}), 1, 4)
- b: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4}), 1, 5)
The 6th order or 6th degree polynomial trendline
- Equation: y = a6*x6 + a5*x5 + a4*x4 + a3*x3 + a2*x2 + a1*x + b
- a6: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4,5,6}), 1)
- a5: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4,5,6}), 1, 2)
- a4: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4,5,6}), 1, 3)
- a3: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4,5,6}), 1, 4)
- a2: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4,5,6}), 1, 5)
- a1: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4,5,6}), 1, 6)
- b: = INDEX (LINEST (known_y's, known_x's ^ {1,2,3,4,5,6}), 1, 7)
The array function LINEST () calculates the linear trend that fits best the known data and then returns an array with statistics describing the trend line:
= LINEST (known_y's, [known_x's], [const], [stats])
where:
- known_y's (required) are dependent data values,
- known_x's (optional) are independent data values:
These variables cannot be omitted for the correct calculation for a polynomic trend. In any case, you should provide the known_x's or the array {1,2,3,...} that is the same size as known_y's,
- const (optional) is a Boolean value specifying whether to force the parameter b (intercept) to equal 0:
- If const is TRUE or omitted, the parameter b is calculated normally,
- If const is FALSE, the parameter b is set equal to 0.
- stats (optional) is a Boolean value specifying whether to return additional regression statistics:
- If stats is TRUE, the LINEST () function returns the additional regression statistics in the form of an array:
where:
Line Statistics Description 1 an *** a1 b Parameters of the function 2 sean *** sea1 seb Standard error of each parameter 3 R2 R-squared value sey Standard error of y estimate (also known as Standard Error of a regression) shows how big the prediction error is 4 F F statistic (F-observed value) dr Degrees of freedom 5 ssreg Regression sum of squares (also known as SSR) ssresid Residual sum of squares (also known as SSE) - If stats is FALSE or omitted, the function returns only the calculated parameters.
- If stats is TRUE, the LINEST () function returns the additional regression statistics in the form of an array:
For the 6th degree polynomial trend:
Notes:
- If you have a version of Microsoft Excel other than Microsoft 365, functions that return arrays must be entered as array functions using Ctrl+Shift+Enter unless you can just press Enter.
- Because the LINEST () function returns an array of data, you can use the INDEX () function to display only the desired parameter(s) from the returned data array.
Calculating the polynomial trendline values
To calculate the polynomial trendline values, Excel offers the array function TREND () (see the notes above about array functions) that returns values along a linear trend. To work with this function for polynomial trends, just use the indicated powers of the known_x's instead of known_x's:
= TREND (known_y's, [known_x's], [new_x's], [const])
where:
- known_y's (required) are dependent data values,
- known_x's (optional) are independent data values:
These values cannot be omitted for the correct calculation for a polynomial trend. In any case, you should provide the known_x's or the array {1,2,3,...} that is the same size as known_y's,
- new_x's (optional) are the new x-values for which the TREND () function should calculate corresponding y-values:
- If you omit new_x's, it is assumed to be the same as known_x's,
- const (optional) is a Boolean value specifying whether to force the parameter b to 0:
- If const is TRUE or omitted, the parameter b is calculated normally,
- If const is FALSE, the parameter b is set to 0.
R-squared value
The R-squared value (in Excel, R2) calculates how well a trendline fits the data. The R-squared value can vary from 0 to 1: the nearer R2 is to 1, the better the trendline fits the data.
The R-squared value can be displayed on the chart (see how to turn on the R-squared value). You can easily calculate that value using one of the methods:
- Use the LINEST () function to return the first statistics of the third line of the statistics array:
= INDEX (LINEST (known_y's, known_x's ^ {1,2,3,...}, , TRUE), 3, 1):
- Use the RSQ () function that returns the R-squared value of two data sets:
= RSQ (array1, array2)
where array1 and array2 (both are required) are two data sets: