@ -95,5 +96,22 @@ This works well for monovariate problems. In the case of multivariate problems,
\section{Logistic Regression}
gistic regression is fundamentally different from Linear Regression in the sens that it returns the probability of an input belongign to one class. This probability can be compared to a threshold (usually $0.5$) to express the predicted class of the input. In this sense, the logistic regression does not return a real number but a categorical output. In this context, the "training" data consist of inputs that are real number and output that are binary (either 0 or 1). The logistic regression can be viewed as an extension of the linear regression where the affine curve is "bent" to be confined between 0 and 1. We start by expressing the logistic curve as the probability of the input $x$ belonging to the class 1.
We find again the linear regression function that is used to predict $y$ from $x$ but it is now integrated in the larger logistic function to contrain it between 0 and 1. We still have to find the best value for $\beta$ and in this case, and we can use the common likelihood estimator (or negative log-lokelihood for practicality) and optimize it. The loss for the $k^{th}$ point is expressed as:
The issue now is that this function is non-linear so we cannot simply derive it and solve for 0 to get the optimum. Numerical methods such as gradient descent or Newton–Raphson need to be leverage to approximate the optimum $\beta$ value.