Neural Network Basics (Part 2)

Optimising with Gradient Descent
machinelearning
ai
mathematics
Author

Tony Phung

Published

February 2, 2024

Automation of finding the best parameters (lowest loss) based on Mean Average Error (MAE) using Gradient Descent for our Quadratic Function

1. Import Libraries

from ipywidgets import interact
from fastai.basics import *
import pandas as pd
from functools import partial

2. Upload Data and Convert Data to Pytorch Tensors

df = pd.read_csv("upload_dataset.csv")
x_trch = torch.tensor(df.x) 
y_trch = torch.tensor(df.y)

3. Create Customisable Quadratic functions and Interactively Plot with MAE

def gen_quad_fn(a,b,c,x): return a*x**2 + b*x + c
def custom_quad_fn(a,b,c): return partial(gen_quad_fn,a,b,c)
def torch_mae(prediction, actual): return (torch.abs(prediction-actual).mean())
def torch_mse(prediction, actual): return ((prediction-actual)**2).mean()
# def mae(prediction, actual): return np.mean(abs(prediction-actual))
# def torch_mae(prediction, actual): return np.mean(torch.abs(prediction-actual))
# def mae(prediction, actual): return (torch.abs(prediction-actual).mean())
# def mae2(prediction, actual): return abs(prediction-actual).mean()
# def mae_jh(prediction, actual): return (abs(prediction-actual)).mean()
# def mse_jh(prediction, actual): return ((prediction-actual)**2).mean()
# def mae(preds, acts): return (torch.abs(preds-acts)).mean()
plt.rc('figure', dpi=90)

@interact(a=(0,2.1,0.1),b=(0,2.1,0.1),c=(0,2.1,0.1))
def interactive_plot(a,b,c):
# 1.    plot scatter
    plt.scatter(x_trch, y_trch)
# 2     create custom_quad_interactive_fn
# 2.1   create xs_interact    
    xs_interact = x_trch
# 3.    create ys_interact
    plt.ylim(-1,15)
    ys_interact = custom_quad_fn(a,b,c)(xs_interact)
# 4.    calc mae
    y_actual     = y_trch
    y_predicted  = custom_quad_fn(a,b,c)(x_trch)
    interact_mae = torch_mae(y_predicted,y_actual)
# 5. plot   
    plt.plot(xs_interact, ys_interact)
    plt.title(f"MAE: {interact_mae:.2f}")

4. Determining the effect of the parameters (\(a\), \(b\), \(c\)) in: \(ax + bx^2 + c\)

The key thing to understand if whether the loss function gets better or worse when you increase the parameters a little.

There are two ways we can try: 1. Manually adjust the parameter: Move each parameter each way and observe the impact to MAE.
2. Calculate the Derivative of the parameter: A Derivative iS a function that tells you if you increase the input the: - direction in which output changes (increases or decreases) and the;
- magnitude of the change to the output

4.1 Create Mean-Absolute-Error (mae) Quadratic Function

This function will take in the parameters or coefficients of a quadratic function and output the MSE. - Input: coeffiicents of quadratic - Output: MAE (between the prediction of the quadratic with the coffecients of the quadratic and the actual predictsions)

def mae_quad_fn(x_trch, y_trch, abc_params):
    quad_fn = custom_quad_fn(*abc_params)
    y_predicted_trch = quad_fn(x_trch)
    y_actual_trch    = y_trch
    # so quad_params(2,3,4) ->  creates a custom quad fn -> 2x^2 + 3x + 4
    return torch_mae(y_predicted_trch,y_actual_trch)

def mse_quad_fn(x_trch, y_trch, abc_params):
    quad_fn = custom_quad_fn(*abc_params)
    y_predicted_trch = quad_fn(x_trch)
    y_actual_trch    = y_trch
    # so quad_params(2,3,4) ->  creates a custom quad fn -> 2x^2 + 3x + 4
    return torch_mse(y_predicted_trch,y_actual_trch)

The chart shows MAE(2,2,2) = 1.4501 loss Our mae_function also calculates 1.491 loss.

mae_quad_fn(x_trch=x_trch,y_trch=y_trch,abc_params=[1.0,1.0,1.0])
tensor(2.6103, dtype=torch.float64)

A tensor is a pytorch type that works with: - lists (1D tensors) - tables (2D tensors) - layers of tables of numbers (3D tensors) and etc

4.2 Telling PyTorch to calculate gradients

By calling method .requires_grad_(), our abc_rg tensor is not will calculate gradients whenever we use the tensor.

# rank 1 tensor
abc_rg = torch.tensor([1.0,1.0,1.0])
abc_rg.requires_grad_()
tensor([1., 1., 1.], requires_grad=True)
abc_rg
tensor([1., 1., 1.], requires_grad=True)

4.2.1 Method .requires_grad_()

grad_fn=<MeanBackward0> shows the gradients are calculated to for each parameter (our inputs)

loss = mae_quad_fn(x_trch, y_trch, abc_rg)
loss
tensor(2.6103, dtype=torch.float64, grad_fn=<MeanBackward0>)

4.2.2 Method .backward()

This adds an attribute .grad to our abc_rg tensor.

loss.backward()

4.2.3 Attribute .grad

This attributes tells us if we increase the input slightly in the same position of this tensor, the loss will increase (if its positive) or decrease (if negative)

abc_rg.grad
tensor([-1.3529, -0.0316, -0.5000])

4.2.4 Increase our abc parameters and recalculate loss

with torch.no_grad():
    print(f"loss before: {loss}")
    abc_rg -= abc_rg.grad * 0.01
    loss = mae_quad_fn(x_trch, y_trch, abc_rg)
    print(f"loss after: {loss}")
loss before: 2.61030324932801
loss after: 2.5894896953092177

4.2.5 Automate it

Create a loop that decreases the loss by iteratively increasing the parameters (since the gradients are negative, or vice versa)

for i in range(10):
    loss = mae_quad_fn(x_trch, y_trch, abc_rg)
    loss.backward()
    with torch.no_grad(): abc_rg -= abc_rg.grad * 0.01
    print(f"step {i}: {loss} - {abc_rg.grad}") 
step 0: 2.5894896953092177 - tensor([-2.7058, -0.0632, -1.0000])
step 1: 2.547862587271633 - tensor([-4.0587, -0.0947, -1.5000])
step 2: 2.4854217639359875 - tensor([-5.4116, -0.1263, -2.0000])
step 3: 2.4021673865815485 - tensor([-6.7645, -0.1579, -2.5000])
step 4: 2.2980994552083187 - tensor([-8.1175, -0.1895, -3.0000])
step 5: 2.173217969816296 - tensor([-9.4704, -0.2211, -3.5000])
step 6: 2.0300959430578267 - tensor([-10.6892,  -0.3684,  -3.9000])
step 7: 1.883669135864714 - tensor([-11.9080,  -0.5158,  -4.3000])
step 8: 1.740979068220988 - tensor([-12.9396,  -0.8000,  -4.6000])
step 9: 1.5914231086209807 - tensor([-13.9712,  -1.0842,  -4.9000])

5 Parameters are getting closer

The parameters started as 1,1,1 and now are 1.9, 1.0, 1.3, the underlying function was modelled with 3, 2, 1 so its getting there!

[Future Iteration] How to just fix a parameter and just move the others?

abc_rg
tensor([1.8739, 1.0365, 1.3170], requires_grad=True)

To be Continued…

Next A universal function called the ReLU Function (rather than a quadratric function) is used for our modelling.

Neural Network Basics: Part 1
Neural Network Basics: Part 2
Neural Network Basics: Part 3