### Machine Learning

### Andrew Ng

### Alexander Egorenkov

### Conor McDonald

### Machine learning fundamentals (I): Cost functions and gradient descent

#### **This is part one of a series on machine learning fundamentals.

https://towardsdatascience.com/machine-learning-fundamentals-via-linear-regression-41a5d11f5220

### Machine learning fundamentals (II): Neural networks

https://towardsdatascience.com/machine-learning-fundamentals-ii-neural-networks-f1e7b2cb3eef

### Lachlan Miller

# Machine Learning week 1: Cost Function, Gradient Descent and Univariate Linear Regression

I have started doing Andrew Ng’s popular machine learning course on Coursera. The first week covers a *lot, *at least for someone who hasn’t touched much calculus for a few years

- Cost Functions (mean difference squared)
- Gradient Descent
- Linear Regression

## Understanding and Calculating the Cost Function for Linear Regression

his post will focus on the properties and application of cost functions, how to solve it them by hand. Then we will implement the calculations twice in Python, once with `for`

loops, and once with `vectors`

using numpy. This goes into more detail than my previous article about linear regression, which was more a high level summary of the concepts.

When learning about linear regression in Andrew Ng’s Coursera course, two functions are introduced:

- the cost function
- gradient descent

### From Stack Overflow (attributed)

# Can someone explain to me the difference between a cost function and the gradient descent equation in logistic regression?

Whenever you train a model with your data, you are actually producing some new values (predicted) for a specific feature. However, that specific feature already has some values which are real values in the dataset. We know the closer the predicted values to their corresponding real values, the better the model.

Now, we are using cost function to measure how close the predicted values are to their corresponding real values.

We also should consider that the weights of the trained model are responsible for accurately predicting the new values. Imagine that our model is y = 0.9*X + 0.1, the predicted value is nothing but (0.9*X+0.1) for different Xs. [0.9 and 0.1 in the equation are just random values to understand.]

So, by considering Y as real value corresponding to this x, the cost formula is coming to measure how close (0.9*X+0.1) is to Y.

We are responsible for finding the better weight (0.9 and 0.1) for our model to come up with a lowest cost (or closer predicted values to real ones).

Gradient descent is an optimization algorithm (we have some other optimization algorithms) and its responsibility is to find the minimum cost value in the process of trying the model with different weights or indeed, updating the weights.

We first run our model with some initial weights and gradient descent updates our weights and find the cost of our model with those weights in thousands of iterations to find the minimum cost.

One point is that gradient descent is not minimizing the weights, it is just updating them. This algorithm is looking for minimum cost.