!pip install -Uqq fastai
Today I’ll be attempting to build my first deep learning image classifier to distinguish between rice and noodles using knowledge gained from Jeremy Howards Fast AI course
High-level steps:
1. Search and Prepare Data
2. Create DataLoader
3. Create Learner
4. Prediction
I will detail any problems, issues, questions and resolutions during the process.
from fastbook import *
c:\Users\tonyp\miniconda3\envs\fastai\Lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: '[WinError 127] The specified procedure could not be found'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
1. Search and Prepare Data
# 1.1 Get 'rice' photos
'rice',max_images=1)[0],'rice.jpg',show_progress=False)
download_url(search_images_ddg(open('rice.jpg').to_thumb(256,256) Image.
# 1.2 Get 'noodles' photos
'noodles', max_images=1)[0],'noodles.jpg',show_progress=False)
download_url(search_images_ddg(open('noodles.jpg').to_thumb(256,256) Image.
Lets use 60 imagess of ‘rice’ and ‘noodles’ from DuckDuckGo.
Note: I downloaded for 100 images of each and then taking 60 of them as some images fail so I’m leaving room for failed photos.
Question: Why do we need verify and why do some photos fail?
# 1.3 Prep images in folders
= ['rice', 'noodles']
searches = Path('rice_or_noodles')
path
if not path.exists(): # Ensure the path exists
for o in searches:
= (path/o)
dest =True, exist_ok=True)
dest.mkdir(parentsprint(f'Searching for {o} images...')
= search_images_ddg(f'{o} photo',max_images=100)
results print(f'{len(results)} images found for {o}. Downloading...')
=results[:60])
download_images(dest, urlsprint(f'Resizing images in {dest}')
=400, dest=dest) resize_images(dest, max_size
# 1.4 Remove Failed images
= Path('rice_or_noodles')
path = verify_images(get_image_files(path))
failed map(Path.unlink) failed.
(#0) []
2. Create DataLoader
# 2.1
= DataBlock(
dls = (ImageBlock, CategoryBlock), # i.e.input image / ouput is category (coin or notes)
blocks = get_image_files, # returns list of images files
get_items = RandomSplitter(valid_pct=0.2, seed=42), # critical to test accuracy with validation set
splitter =parent_label, # use parents folder of a path
get_y=[Resize(192, method="squish")] # most computer vision architecutres need all your inputs to be same size
item_tfms ).dataloaders(path)
c:\Users\tonyp\miniconda3\envs\fastai\Lib\site-packages\fastai\torch_core.py:263: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
return getattr(torch, 'has_mps', False)
# 2.2 We can see Paths were created for every image and split into our training and data sets
2]
dls.train_ds.items[:2] dls.valid_ds.items[:
[Path('rice_or_noodles/rice/4280fe58-691a-4c0b-85a5-5c1c8400ecb7.jpg'),
Path('rice_or_noodles/rice/f8a77d77-c007-4854-af8b-2af624a8da66.jpg')]
[Question]: How does it know whether it is training set or valid set? I guess theres some indexing somewhere that I dont know how to obtain.
# 2.1 Show a training batch which has an 'image' and a 'label'
=6) #batch shows input and label dls.show_batch(max_n
2. Create Learner using ResNet
In the course, we used a pre-trained model ‘ResNet18
’ (RN
).
Why Pre-trained Models?:
- Pre-trained models is like getting an athlete who is very good basic sport related skills like hand-eye coordination, jumping, running/sprinting, changing directions etc and then - telling them to learn a specific sport (fine-tuning), - say tennis (labelled dataset provided). With a good base of skills, this person should be able to learn tennis to a good level…
ResNet18:
- ResNet18 is trained on 1.28 million images with 1000 object categories. - 18 layers
- Trained on ImageNet dataset
[Future iterations 1]: Perhaps there are alternative pre-trained models specialising in food?
[Future iterations 2]: - Read up and try understand the various architectures Fast AI’s TIMM model architectures - Try different architectures and different versions
= vision_learner(dls, resnet18, metrics=error_rate) learner_RN18
2.1 Learner Model Times:
They all took under 10 seconds to create the general learner. Now to fine-tune them!
8) learner_RN18.fine_tune(
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.840357 | 4.676042 | 0.476190 | 00:03 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.763106 | 3.761843 | 0.476190 | 00:04 |
1 | 1.517361 | 2.798523 | 0.476190 | 00:04 |
2 | 1.202234 | 2.308116 | 0.428571 | 00:04 |
3 | 0.953227 | 1.637496 | 0.428571 | 00:04 |
4 | 0.770979 | 1.034023 | 0.380952 | 00:04 |
5 | 0.662257 | 0.641428 | 0.190476 | 00:04 |
6 | 0.563239 | 0.405057 | 0.142857 | 00:04 |
7 | 0.490904 | 0.285846 | 0.095238 | 00:04 |
Our learner is performing at 90% accuracy (9% error rate) by looking at only 60 photos!
Lets try predict some random photos of rice and noodles I’ve found on the internet.
from IPython.display import Image # import image viewer
# noodle predictor
= SimpleNamespace(data = ['test_noodle.jpg'])
uploader = uploader.data[0]
image_path =image_path))
display(Image(filename= learner_RN18.predict(image_path)
res1, res2, res3 print(f"{res1}: {res3[res2]*100:.2f}%")
c:\Users\tonyp\miniconda3\envs\fastai\Lib\site-packages\fastai\torch_core.py:263: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
return getattr(torch, 'has_mps', False)
noodles: 99.98%
Prediction 1: Noodles
The model predicted noodles correctly with 99.98% confidence!
# rice predictor 1
= SimpleNamespace(data = ['test_rice.jpg'])
uploader = uploader.data[0]
image_path =image_path))
display(Image(filename
= learner_RN18.predict(image_path)
res1, res2, res3 print(f"{res1}: {res3[res2]*100:.2f}%")
noodles: 66.22%
Prediction and Results 2: Rice 1
The model predicted rice incorrectly with 66.22% confidence!
I was a bit confused so I decided to provide another image of rice to make
# rice predictor 2
= SimpleNamespace(data = ['test_rice2.jpg'])
uploader = uploader.data[0]
image_path =image_path)) # show image
display(Image(filename
# get
= learner_RN18.predict(image_path)
res1, res2, res3 print(f"{res1}: {res3[res2]*100:.2f}%")
noodles: 98.51%
Prediction and Results 3: Rice 2
The model predicted rice incorrectly with 98.51% confidence!
Okay now there is clearly something wrong going on. I decide to take a gander at the photos in my ‘rice’ folder.
It looks like we’ve trained a learner specialises in bowled or white rice. I was testing the model with fried rice since that is my favourite rice dish.
Lets test out a couple photos on bowled rice.
# rice predictor 2
= SimpleNamespace(data = ['test_boiledrice1.jpg'])
uploader1 = SimpleNamespace(data = ['test_boiledrice2.jpg'])
uploader2 = uploader1.data[0]
image_path1 = uploader2.data[0]
image_path2
=image_path1)) # show image
display(Image(filename=image_path2)) # show image
display(Image(filename
= learner_RN18.predict(image_path1)
res1, res2, res3 print(f"{res1}: {res3[res2]*100:.2f}%")
= learner_RN18.predict(image_path2)
res1, res2, res3 print(f"{res1}: {res3[res2]*100:.2f}%")
rice: 88.73%
noodles: 92.57%
Now I’m confused as its predicting incorrectly with 92.57% confidence.
Perhaps the model isnt seeing enough data?
Lets train a new model with:
- 300 images instead of 60
- ‘rice food’ and ‘noodle food’ as keyword insteads of just ‘rice’ and ‘noodles’
= ['rice food', 'noodles food']
searches = Path('rice_or_noodles_300')
path_200
if not path_200.exists(): # Ensure the path exists
for o in searches:
= (path_200/o)
dest =True, exist_ok=True)
dest.mkdir(parentsprint(f'Searching for {o} images...')
= search_images_ddg(f'{o} photo',max_images=300)
results print(f'{len(results)} images found for {o}. Downloading...')
=results[:200])
download_images(dest, urlsprint(f'Resizing images in {dest}')
=400, dest=dest) resize_images(dest, max_size
# 1.4 Remove Failed images
= Path('rice_or_noodles_300')
path_200 = verify_images(get_image_files(path_200))
failed map(Path.unlink)
failed.
(#10) [None,None,None,None,None,None,None,None,None,None]
= DataBlock(
dls_200 = (ImageBlock, CategoryBlock), # i.e.input image / ouput is category (coin or notes)
blocks = get_image_files, # returns list of images files
get_items = RandomSplitter(valid_pct=0.2, seed=42), # critical to test accuracy with validation set
splitter =parent_label, # use parents folder of a path
get_y=[Resize(192, method="squish")] # most computer vision architecutres need all your inputs to be same size
item_tfms ).dataloaders(path_200)
= vision_learner(dls_200, resnet18, metrics=error_rate) learner_RN18_200
4) learner_RN18_200.fine_tune(
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.155098 | 0.872050 | 0.338462 | 00:11 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.625260 | 0.402908 | 0.169231 | 00:15 |
1 | 0.442973 | 0.289800 | 0.138462 | 00:14 |
2 | 0.317375 | 0.328805 | 0.153846 | 00:14 |
3 | 0.235606 | 0.327507 | 0.123077 | 00:15 |
# Prediction with new learner (300 images and specific keywords)
# rice predictor 2
= SimpleNamespace(data = ['test_boiledrice1.jpg'])
uploader1 = SimpleNamespace(data = ['test_boiledrice2.jpg'])
uploader2 = uploader1.data[0]
image_path1 = uploader2.data[0]
image_path2
=image_path1)) # show image
display(Image(filename=image_path2)) # show image
display(Image(filename
= learner_RN18_200.predict(image_path1)
res1, res2, res3 print(f"{res1}: {res3[res2]*100:.2f}%")
= learner_RN18_200.predict(image_path2)
res1, res2, res3 print(f"{res1}: {res3[res2]*100:.2f}%")
rice food: 100.00%
rice food: 99.95%
So it’s now 100 and 99.95% confident they’re rice, which is great!
Lets try some fried rice!
We’ll retest now at the fried rice photo which the initial model guessed to be noodles with 98.5% confidence
# Prediction with new learner (300 images and specific keywords)
# rice predictor 2
= SimpleNamespace(data = ['test_rice2.jpg'])
uploader1 = uploader1.data[0]
image_path1
=image_path1))
display(Image(filename
= learner_RN18_200.predict(image_path1)
res1, res2, res3 print(f"{res1}: {res3[res2]*100:.2f}%")
rice food: 99.56%
Great! It is correct with 99.56% confidence.
I think we’ve created a great rice and noodles classifier, lets stop here.
[Future Iteration 3]: Build web app for everyone to test it out
[Future Iteration 4]: Make it useable on my blog
[Question] I wonder if theres a way to quickly see all specific headings I’ve used, I find myself scrolling up and download to find what Iteration I’m up to…
Apologies for the lack of neatness, lets hope this improves over time…