diff --git a/main.tex b/main.tex index abf82d6..387cc7c 100644 --- a/main.tex +++ b/main.tex @@ -10,6 +10,7 @@ \usepackage{fancyhdr} \usepackage{url} \usepackage{hyperref} +\usepackage{amsmath} \usepackage{hyperref} \hypersetup{ @@ -95,5 +96,22 @@ This works well for monovariate problems. In the case of multivariate problems, \section{Logistic Regression} +gistic regression is fundamentally different from Linear Regression in the sens that it returns the probability of an input belongign to one class. This probability can be compared to a threshold (usually $0.5$) to express the predicted class of the input. In this sense, the logistic regression does not return a real number but a categorical output. In this context, the "training" data consist of inputs that are real number and output that are binary (either 0 or 1). The logistic regression can be viewed as an extension of the linear regression where the affine curve is "bent" to be confined between 0 and 1. We start by expressing the logistic curve as the probability of the input $x$ belonging to the class 1. + +\begin{equation} + p(x) = \dfrac{1}{1+e^{-y}} = \dfrac{1}{1+e^{-(\beta_1x+\beta_0)}} +\end{equation} + +We find again the linear regression function that is used to predict $y$ from $x$ but it is now integrated in the larger logistic function to contrain it between 0 and 1. We still have to find the best value for $\beta$ and in this case, and we can use the common likelihood estimator (or negative log-lokelihood for practicality) and optimize it. The loss for the $k^{th}$ point is expressed as: + +$$ +L_k = +\begin{cases} + -ln(p_k)~if~y_k&=1\\ + -ln(1-p_k)~if~y_k&=0\\ +\end{cases} += -ln\left(\dfrac{1}{1+e^{-(\beta_1x+\beta_0)}}\right)^{y_k}\left(\dfrac{1}{1+e^{\beta_1x+\beta_0)}}\right)^{1-y_k} +$$ +The issue now is that this function is non-linear so we cannot simply derive it and solve for 0 to get the optimum. Numerical methods such as gradient descent or Newton–Raphson need to be leverage to approximate the optimum $\beta$ value. \end{document}