Day 14: Gradient Descent for Logistic Regression
- eyereece
- Jul 13, 2023
- 2 min read
Updated: Jul 27, 2023
We have previously looked at gradient descent for linear regression. Today, we will look at gradient descent for logistic regression.
Recall that the purpose of running gradient descent is to find the values of parameters w and b, and we do that by minimizing the cost function J
Recall:
The model function we use in logistic regression:


The algorithm to minimize the cost function:

so, the gradient descent algorithm for logistic regression:

Gradient descent implementation in Python
The gradient descent algorithm implementation has two components:
The loop implementing the gradient descent algorithm. (gradient_descent)
the calculation of the partial derivatives (see the algorithm inside the square brackets of the gradient descent algorithm). (compute_gradient_logistic)
the partial derivative for w[j] will be denoted dj_dw and b would be dj_db
To implement the partial derivatives to find parameters w and b:
initialize variables to accumulate dj_dw and dj_db
for each example:
calculate the error for that example f(x) - y[i]
for each input value xj_i in this example,
multiply the error by the input xj_i and add to the corresponding element
dj_dw
add the error to dj_db
divide dj_db and dj_dw by total number of examples (m)
note that x[i] in numpy X[i,:] or X[i] and xj_i is X[i,j]
def compute_gradient_logistic(X, y, w, b):
m, n = X.shape
dj_dw = np.zeros((n,))
dj_db = 0.
for i in range(m):
f_wb_i = sigmoid(np.dot(X[i], w) + b)
err_i = f_wb_i - y[i]
for j in range(n):
dj_dw[j] = dj_dw[j] + err_i * X[i,j]
dj_db = dj_db + err_i
dj_dw = dj_dw/m
dj_db = dj_db/m
return dj_db, dj_dw
Notations:
Args:
X (ndarray (m,n)) : Data, m examples with n features
y (ndarray (m,) : target values
w (ndarray (n,) : model parameters
b (scalar) : model parameter
Returns:
dj_dw (ndarray (n,)) : the gradient of cost w.r.t the parameters w
dj_db (scalar) : the gradient of cost w.r.t the parameter b
def gradient_descent(X, y, w_in, b_in, alpha, num_iters):
# An array to store cost J and w's at each iteration
J_history = []
w = copy.deepcopy(w_in)
b = b_in
for i in range(num_iters):
# calculate the gradient and update the parameters
dj_db, dj_dw = compute_gradient_logistic(X, y, w, b)
# update parameters using w, b, alpha, and gradient
w = w - alpha * dj_dw
b = b - alpha * dj_db
# Save cost J at each iteration
if i < 100000:
J_history.append(compute_cost_logistic(X, y, w, b))
# print cost every at intervals 10 times or as many iterations if < 10
if i%math.ceil(num_iters/10) == 0:
print(f"Iteration {i:4d}: Cost {J_history[-1]}")
return w, b, J_history
Comments