from ipywidgets import interact
from fastai.basics import *
import pandas as pd
from functools import partial
Automation of finding the best parameters (lowest loss) based on Mean Average Error (MAE) using Gradient Descent for our Quadratic Function
1. Import Libraries
2. Upload Data and Convert Data to Pytorch Tensors
= pd.read_csv("upload_dataset.csv")
df = torch.tensor(df.x)
x_trch = torch.tensor(df.y) y_trch
3. Create Customisable Quadratic functions and Interactively Plot with MAE
def gen_quad_fn(a,b,c,x): return a*x**2 + b*x + c
def custom_quad_fn(a,b,c): return partial(gen_quad_fn,a,b,c)
def torch_mae(prediction, actual): return (torch.abs(prediction-actual).mean())
def torch_mse(prediction, actual): return ((prediction-actual)**2).mean()
# def mae(prediction, actual): return np.mean(abs(prediction-actual))
# def torch_mae(prediction, actual): return np.mean(torch.abs(prediction-actual))
# def mae(prediction, actual): return (torch.abs(prediction-actual).mean())
# def mae2(prediction, actual): return abs(prediction-actual).mean()
# def mae_jh(prediction, actual): return (abs(prediction-actual)).mean()
# def mse_jh(prediction, actual): return ((prediction-actual)**2).mean()
# def mae(preds, acts): return (torch.abs(preds-acts)).mean()
'figure', dpi=90)
plt.rc(
@interact(a=(0,2.1,0.1),b=(0,2.1,0.1),c=(0,2.1,0.1))
def interactive_plot(a,b,c):
# 1. plot scatter
plt.scatter(x_trch, y_trch)# 2 create custom_quad_interactive_fn
# 2.1 create xs_interact
= x_trch
xs_interact # 3. create ys_interact
-1,15)
plt.ylim(= custom_quad_fn(a,b,c)(xs_interact)
ys_interact # 4. calc mae
= y_trch
y_actual = custom_quad_fn(a,b,c)(x_trch)
y_predicted = torch_mae(y_predicted,y_actual)
interact_mae # 5. plot
plt.plot(xs_interact, ys_interact)f"MAE: {interact_mae:.2f}") plt.title(
4. Determining the effect of the parameters (\(a\), \(b\), \(c\)) in: \(ax + bx^2 + c\)
The key thing to understand if whether the loss function gets better or worse when you increase the parameters a little.
There are two ways we can try: 1. Manually adjust the parameter: Move each parameter each way and observe the impact to MAE.
2. Calculate the Derivative of the parameter: A Derivative iS a function that tells you if you increase the input the: - direction in which output changes (increases or decreases) and the;
- magnitude of the change to the output
4.1 Create Mean-Absolute-Error (mae) Quadratic Function
This function will take in the parameters or coefficients of a quadratic function and output the MSE. - Input: coeffiicents of quadratic - Output: MAE (between the prediction of the quadratic with the coffecients of the quadratic and the actual predictsions)
def mae_quad_fn(x_trch, y_trch, abc_params):
= custom_quad_fn(*abc_params)
quad_fn = quad_fn(x_trch)
y_predicted_trch = y_trch
y_actual_trch # so quad_params(2,3,4) -> creates a custom quad fn -> 2x^2 + 3x + 4
return torch_mae(y_predicted_trch,y_actual_trch)
def mse_quad_fn(x_trch, y_trch, abc_params):
= custom_quad_fn(*abc_params)
quad_fn = quad_fn(x_trch)
y_predicted_trch = y_trch
y_actual_trch # so quad_params(2,3,4) -> creates a custom quad fn -> 2x^2 + 3x + 4
return torch_mse(y_predicted_trch,y_actual_trch)
The chart shows MAE(2,2,2) = 1.4501 loss Our mae_function also calculates 1.491 loss.
=x_trch,y_trch=y_trch,abc_params=[1.0,1.0,1.0]) mae_quad_fn(x_trch
tensor(2.6103, dtype=torch.float64)
A tensor is a pytorch type that works with: - lists (1D tensors) - tables (2D tensors) - layers of tables of numbers (3D tensors) and etc
4.2 Telling PyTorch to calculate gradients
By calling method .requires_grad_(), our abc_rg
tensor is not will calculate gradients whenever we use the tensor.
# rank 1 tensor
= torch.tensor([1.0,1.0,1.0])
abc_rg abc_rg.requires_grad_()
tensor([1., 1., 1.], requires_grad=True)
abc_rg
tensor([1., 1., 1.], requires_grad=True)
4.2.1 Method .requires_grad_()
grad_fn=<MeanBackward0>
shows the gradients are calculated to for each parameter (our inputs)
= mae_quad_fn(x_trch, y_trch, abc_rg)
loss loss
tensor(2.6103, dtype=torch.float64, grad_fn=<MeanBackward0>)
4.2.2 Method .backward()
This adds an attribute .grad to our abc_rg tensor.
loss.backward()
4.2.3 Attribute .grad
This attributes tells us if we increase the input slightly in the same position of this tensor, the loss will increase (if its positive) or decrease (if negative)
abc_rg.grad
tensor([-1.3529, -0.0316, -0.5000])
4.2.4 Increase our abc
parameters and recalculate loss
with torch.no_grad():
print(f"loss before: {loss}")
-= abc_rg.grad * 0.01
abc_rg = mae_quad_fn(x_trch, y_trch, abc_rg)
loss print(f"loss after: {loss}")
loss before: 2.61030324932801
loss after: 2.5894896953092177
4.2.5 Automate it
Create a loop that decreases the loss by iteratively increasing the parameters (since the gradients are negative, or vice versa)
for i in range(10):
= mae_quad_fn(x_trch, y_trch, abc_rg)
loss
loss.backward()with torch.no_grad(): abc_rg -= abc_rg.grad * 0.01
print(f"step {i}: {loss} - {abc_rg.grad}")
step 0: 2.5894896953092177 - tensor([-2.7058, -0.0632, -1.0000])
step 1: 2.547862587271633 - tensor([-4.0587, -0.0947, -1.5000])
step 2: 2.4854217639359875 - tensor([-5.4116, -0.1263, -2.0000])
step 3: 2.4021673865815485 - tensor([-6.7645, -0.1579, -2.5000])
step 4: 2.2980994552083187 - tensor([-8.1175, -0.1895, -3.0000])
step 5: 2.173217969816296 - tensor([-9.4704, -0.2211, -3.5000])
step 6: 2.0300959430578267 - tensor([-10.6892, -0.3684, -3.9000])
step 7: 1.883669135864714 - tensor([-11.9080, -0.5158, -4.3000])
step 8: 1.740979068220988 - tensor([-12.9396, -0.8000, -4.6000])
step 9: 1.5914231086209807 - tensor([-13.9712, -1.0842, -4.9000])
5 Parameters are getting closer
The parameters started as 1,1,1
and now are 1.9, 1.0, 1.3
, the underlying function was modelled with 3, 2, 1
so its getting there!
[Future Iteration] How to just fix a parameter and just move the others?
abc_rg
tensor([1.8739, 1.0365, 1.3170], requires_grad=True)
To be Continued…
Next A universal function called the ReLU Function (rather than a quadratric function) is used for our modelling.
Neural Network Basics: Part 1
Neural Network Basics: Part 2
Neural Network Basics: Part 3