feat: add logistic regression

2 years ago · aad97dbfe9
parent de8b685aaf
commit aad97dbfe9
1 changed files with 18 additions and 0 deletions
--- a/main.tex
+++ b/main.tex
@ -10,6 +10,7 @@
 \usepackage{fancyhdr}
 \usepackage{url}
 \usepackage{hyperref}
+\usepackage{amsmath}

 \usepackage{hyperref}
 \hypersetup{
@ -95,5 +96,22 @@ This works well for monovariate problems. In the case of multivariate problems,

 \section{Logistic Regression}

+gistic regression is fundamentally different from Linear Regression in the sens that it returns the probability of an input belongign to one class. This probability can be compared to a threshold (usually $0.5$) to express the predicted class of the input. In this sense, the logistic regression does not return a real number but a categorical output. In this context, the "training" data consist of inputs that are real number and output that are binary (either 0 or 1). The logistic regression can be viewed as an extension of the linear regression where the affine curve is "bent" to be confined between 0 and 1. We start by expressing the logistic curve as the probability of the input $x$ belonging to the class 1.
+
+\begin{equation}
+    p(x) = \dfrac{1}{1+e^{-y}} = \dfrac{1}{1+e^{-(\beta_1x+\beta_0)}} 
+\end{equation}
+
+We find again the linear regression function that is used to predict $y$ from $x$ but it is now integrated in the larger logistic function to contrain it between 0 and 1. We still have to find the best value for $\beta$ and in this case, and we can use the common likelihood estimator (or negative log-lokelihood for practicality) and optimize it. The loss for the $k^{th}$ point is expressed as:
+
+$$
+L_k = 
+\begin{cases}
+    -ln(p_k)~if~y_k&=1\\
+    -ln(1-p_k)~if~y_k&=0\\
+\end{cases}
+= -ln\left(\dfrac{1}{1+e^{-(\beta_1x+\beta_0)}}\right)^{y_k}\left(\dfrac{1}{1+e^{\beta_1x+\beta_0)}}\right)^{1-y_k}
+$$

+The issue now is that this function is non-linear so we cannot simply derive it and solve for 0 to get the optimum. Numerical methods such as gradient descent or Newton–Raphson need to be leverage to approximate the optimum $\beta$ value.
 \end{document}