\expandafter\ifx\csname doTocEntry\endcsname\relax \expandafter\endinput\fi \doTocEntry\tocsection{1}{\csname a:TocLink\endcsname{1}{x1-10001}{QQ2-1-1}{Introduction}}{1}\relax \doTocEntry\toclof{a}{\csname a:TocLink\endcsname{1}{x1-1002r1}{}{\ignorespaces \relax }}{subfigure}\relax \doTocEntry\toclof{b}{\csname a:TocLink\endcsname{1}{x1-1003r2}{}{\ignorespaces \relax }}{subfigure}\relax \doTocEntry\toclof{2}{\csname a:TocLink\endcsname{1}{x1-10042}{}{\ignorespaces Fitting a polynomial to data sampled from $f(x) = 7.5 \qopname \relax o{sin}(2.5 \pi x)$. The sampled data is shown as red circles, and the underlying function is shown as the solid blue curve, while the fitted functions are shown as the dotted black line. For a degree-1 polynomial shown in (a) we can see that the chosen polynomial class is unable to correctly represent the target function while in (b) the chosen polynomial very closely resembles the shape of the target function even though it does not pass through all the points perfectly.}}{figure}\relax \doTocEntry\toclof{a}{\csname a:TocLink\endcsname{1}{x1-1005r1}{}{\ignorespaces \relax }}{subfigure}\relax \doTocEntry\toclof{b}{\csname a:TocLink\endcsname{1}{x1-1006r2}{}{\ignorespaces \relax }}{subfigure}\relax \doTocEntry\toclof{4}{\csname a:TocLink\endcsname{1}{x1-10074}{}{\ignorespaces (a) Fitting a degree-6 polynomial to data sampled from $f(x) = 7.5 \qopname \relax o{sin}(2.5 \pi x)$. The sampled data is shown as red circles, and the underlying function is shown as the solid blue curve, while the fitted function is shown as the dotted black line. We can see that the learned function passes through the training data perfectly and will have a very low training error. But can we say anything about how it will behave on out-of-sample (test) data? (b) shows what happens to training and test error when we over and under fit to training data.}}{figure}\relax \doTocEntry\tocsubsection{1.1}{\csname a:TocLink\endcsname{1}{x1-20001.1}{QQ2-1-8}{Occam's Razor}}{7}\relax \doTocEntry\tocsection{2}{\csname a:TocLink\endcsname{1}{x1-30002}{QQ2-1-9}{Regularizing Linear Regression}}{7}\relax \doTocEntry\toclof{5}{\csname a:TocLink\endcsname{1}{x1-30035}{}{\ignorespaces The red circle represents the constraints imposed on the individual coefficients, while the contour plot shows the sum of squares objective. $\mathrm@@ {w}^{*}$ is the ridge regression solution.}}{figure}\relax \doTocEntry\tocsubsection{2.1}{\csname a:TocLink\endcsname{1}{x1-40002.1}{QQ2-1-11}{An Analytical Solution}}{11}\relax \doTocEntry\tocloa{}{\csname a:TocLink\endcsname{1}{x1-4004}{}{\numberline {1}{\ignorespaces The gradient descent algorithm for minimizing a function.}}}{14}\relax \doTocEntry\tocsection{3}{\csname a:TocLink\endcsname{1}{x1-50003}{QQ2-1-12}{Generalizing Ridge Regression}}{15}\relax \doTocEntry\toclof{6}{\csname a:TocLink\endcsname{1}{x1-50016}{}{\ignorespaces Regularizers corresponding to different values of $q$. When $q=1$ we get the Lasso penalty which produces sparse solution by driving most of the coefficients to zero. Lasso can be used for performing feature selection, but cannot be analytically solved instead we need to resort to approximations. $q=2$ produces the Trikhonov regularization resulting in Ridge regression.}}{figure}\relax