Ruled by Mars, the planet of action and desire, Aries energy is fiery, direct, and initiatory. The first sign of the zodiac embodies the pure spark of beginnings, raw courage, and the will to lead. Aries stones work by channeling this powerful Martian energy constructively, transforming raw impulse into sustained power and pioneering vision.
3\) then it is an overdetermined system and will probably not have an exact solution. So we try to find the coefficients that give the best fit in the least squares sense.
This is sometimes called *quadratic regression*.
Let
\[ A = \begin{pmatrix} 1 & t_1 & t_1^2 \ 1 & t_2 & t_2^2 \ \vdots & \vdots & \vdots \ 1 & t_m & t_m^2 \end{pmatrix}, \qquad x = \begin{pmatrix} c_0 \ c_1 \ c_2 \end{pmatrix}, \qquad b = \begin{pmatrix} y_1 \ y_2 \ \vdots \ y_m \end{pmatrix}. \]
Then the system of equations is \(A x = b\).
Find the least squares approximate solution for the following data:
\[ (-1, 2),\ (0, 0),\ (1, 1),\ (2, 3). \]
That is, find \(c_0, c_1, c_2\) such that \(\|A x - b\|\) is minimised.
### Problem 5
Find the line \(y = c_0 + c_1 x\) that best fits the following data in the least squares sense:
\[ (0, 1),\ (1, 3),\ (2, 4),\ (3, 4). \]
### Problem 6
An inner product on a real vector space \(V\) is a function that takes two vectors \(v\) and \(w\) in \(V\) and gives a real number \(\langle v, w\rangle\) such that:
1. \(\langle v, w\rangle = \langle w, v\rangle\) for all \(v, w \in V\) (symmetry),
2. \(\langle u+v, w\rangle = \langle u, w\rangle + \langle v, w\rangle\) for all \(u, v, w \in V\) (additivity),
3. \(\langle \alpha v, w\rangle = \alpha \langle v, w\rangle\) for all \(\alpha \in \mathbb{R}\) and \(v, w \in V\) (homogeneity),
4. \(\langle v, v\rangle \ge 0\) for all \(v \in V\), and \(\langle v, v\rangle = 0\) if and only if \(v = 0\) (positive definiteness).
For example, the dot product on \(\mathbb{R}^n\) is an inner product.
Given an inner product \(\langle\ ,\ \rangle\) on \(V\), we can define the *norm* of a vector \(v\) to be \(\|v\| = \sqrt{\langle v, v\rangle}\), and we can define two vectors \(v\) and \(w\) to be *orthogonal* if \(\langle v, w\rangle = 0\).
Now suppose we have an inner product on \(\mathbb{R}^n\) which is *different* from the usual dot product. We can still try to solve a least squares problem: given an \(m \times n\) matrix \(A\) and a vector \(b \in \mathbb{R}^m\), we want to find \(x \in \mathbb{R}^n\) such that \(\|A x - b\|\) is as small as possible, where now \(\|\ \|\) is the norm coming from the inner product on \(\mathbb{R}^m\).
Let’s denote this inner product by \(\langle\ ,\ \rangle\). Because it is an inner product on \(\mathbb{R}^m\), there is a symmetric, positive definite \(m \times m\) matrix \(M\) such that
\[ \langle v, w\rangle = v^T M w \]
for all \(v, w \in \mathbb{R}^m\). (You don’t need to prove this, but you might like to think about why it is true.)
Show that the least squares solution satisfies
\[ A^T M A x = A^T M b. \]
*(Hint: the least squares solution is characterised by \(A x - b\) being orthogonal to the column space of \(A\) with respect to the inner product. This condition is \(\langle A x - b, A y\rangle = 0\) for all \(y \in \mathbb{R}^n\).)*
---
*Assignment 4 is now available on Blackboard and is due on Thursday 4 April at 2pm.*
---
### Bonus problem
This problem is not part of the assessed syllabus, but you might find it interesting.
You might have wondered about the name “least squares”. In this problem we explore where this name comes from.
In some applications, we are not just interested in fitting a function to data, but also in estimating the uncertainty in the fit. Suppose we have data points
\[ (t_1, y_1),\ (t_2, y_2),\ \ldots,\ (t_m, y_m) \]
and we want to fit a linear function
\[ y = c_0 + c_1 t \]
to the data. But now we assume that the data points are subject to *measurement errors*. Specifically, we assume that
\[ y_i = c_0 + c_1 t_i + \varepsilon_i \]
where \(\varepsilon_i\) is a random error term. We assume that the errors \(\varepsilon_i\) are independent and identically distributed with mean 0 and variance \(\sigma^2\).
We want to estimate the coefficients \(c_0\) and \(c_1\) from the data. One way to do this is to choose \(c_0\) and \(c_1\) to minimise the sum of the *squares* of the errors:
\[ S(c_0, c_1) = \sum_{i=1}^m (y_i - c_0 - c_1 t_i)^2. \]
This is called the *method of least squares*. The idea is that by minimising the sum of squares, we are minimising the *variance* of the errors.
Show that the least squares estimates of \(c_0\) and \(c_1\) are given by
\[ \hat{c}_1 = \frac{\sum_{i=1}^m (t_i - \bar{t})(y_i - \bar{y})}{\sum_{i=1}^m (t_i - \bar{t})^2}, \qquad \hat{c}_0 = \bar{y} - \hat{c}_1 \bar{t}, \]
where \(\bar{t} = \frac{1}{m} \sum_{i=1}^m t_i\) and \(\bar{y} = \frac{1}{m} \sum_{i=1}^m y_i\).
These are the same as the formulas you might have seen in statistics for linear regression.