Constructing GLMs
- Ordinary Least Squares
Ordinary least squares is a special case of the GLM family of models, consider the setting where the target variable y (also called the response variable in GLM terminology) is continuous, and we model the conditional distribution of y given x as a Gaussian N(μ, σ2). ( Here μ may depend x.) So, we let the ExponentialFamily(η) distribution be the Gaussian distribution. As we saw previously, in the formulation of the Gaussian as an exponential family distribution, we had μ = η. So we have
hθ(x) = E[y|x; θ] = μ = η = θTx
- Logistic Regression Here we are interested in binary classification, so y ∈ {0,1}. Given that y is binary-valued, it therefore seems natural to choose the Bernoulli familu of distributions to model the conditional distribution of y given x. In our formulation of the Bernoulli distribution as an exponential family distribution, we had Φ = 1/(1 + e-η). Furthermore, note that if y|x;θ ~ Bernoulli(Φ), then E[y|x; θ] = Φ. So, following a similar derivation as the one for ordinary least squares, we get: hθ(x) = E[y|x; θ] = Φ = 1/(1 + e-η) = 1/(1 + e-θ^Tx)
- Softmax Regression
Derive a GLM for modelling multinomial data. To do so, we will begin by expressing the multinomial as an exponential family distribution.
To parameterize a multinomial over k possible outcomes, one could use k parameters Φ1,...,Φk specifying the probability of each of the outcomes. However, these parameters would be redundant, or more formally, they would not be independent (since knowing any k-1 of the