feat: add logistic regression
This commit is contained in:
parent
de8b685aaf
commit
aad97dbfe9
1 changed files with 18 additions and 0 deletions
18
main.tex
18
main.tex
|
@ -10,6 +10,7 @@
|
|||
\usepackage{fancyhdr}
|
||||
\usepackage{url}
|
||||
\usepackage{hyperref}
|
||||
\usepackage{amsmath}
|
||||
|
||||
\usepackage{hyperref}
|
||||
\hypersetup{
|
||||
|
@ -95,5 +96,22 @@ This works well for monovariate problems. In the case of multivariate problems,
|
|||
|
||||
\section{Logistic Regression}
|
||||
|
||||
gistic regression is fundamentally different from Linear Regression in the sens that it returns the probability of an input belongign to one class. This probability can be compared to a threshold (usually $0.5$) to express the predicted class of the input. In this sense, the logistic regression does not return a real number but a categorical output. In this context, the "training" data consist of inputs that are real number and output that are binary (either 0 or 1). The logistic regression can be viewed as an extension of the linear regression where the affine curve is "bent" to be confined between 0 and 1. We start by expressing the logistic curve as the probability of the input $x$ belonging to the class 1.
|
||||
|
||||
\begin{equation}
|
||||
p(x) = \dfrac{1}{1+e^{-y}} = \dfrac{1}{1+e^{-(\beta_1x+\beta_0)}}
|
||||
\end{equation}
|
||||
|
||||
We find again the linear regression function that is used to predict $y$ from $x$ but it is now integrated in the larger logistic function to contrain it between 0 and 1. We still have to find the best value for $\beta$ and in this case, and we can use the common likelihood estimator (or negative log-lokelihood for practicality) and optimize it. The loss for the $k^{th}$ point is expressed as:
|
||||
|
||||
$$
|
||||
L_k =
|
||||
\begin{cases}
|
||||
-ln(p_k)~if~y_k&=1\\
|
||||
-ln(1-p_k)~if~y_k&=0\\
|
||||
\end{cases}
|
||||
= -ln\left(\dfrac{1}{1+e^{-(\beta_1x+\beta_0)}}\right)^{y_k}\left(\dfrac{1}{1+e^{\beta_1x+\beta_0)}}\right)^{1-y_k}
|
||||
$$
|
||||
|
||||
The issue now is that this function is non-linear so we cannot simply derive it and solve for 0 to get the optimum. Numerical methods such as gradient descent or Newton–Raphson need to be leverage to approximate the optimum $\beta$ value.
|
||||
\end{document}
|
||||
|
|
Loading…
Reference in a new issue