CNN: Convolutional Neural Networks

8 min readApr 22, 2023

Have you ever wondered how facial recognition on your smart gadgets works? Or perhaps how you might communicate with them in your native tongue? Convolutional Neural Network aids in achieving these objectives. Sail through the article to get some insights.

BACKGROUND

A digital image is one made up of pixels i.e picture elements. The process through which the computer reads the image involves:

Image breakdown: In this step, the digital image is broken down into the RGB channel, namely Red, Green, and Blue. The specific image pixel is mapped with the respective color channels.
Pixel Value Recognition: The system identifies the value of each pixel and hence uses it to determine the size of the image.

In the case of grey-scale images, only one channel is capable of doing the task.

INTRODUCTION

The Convolutional Neural Networks, or CNN in short, is similar to other neural networks used in Machine Learning with neurons (basic learning unit) along with biases and learnable weights. Each unit receives the input and after taking the weighted sum passes through the activation function, thus giving an output.

Considering an example of the captcha validation. An activation function validates whether the image chosen is a traffic light or not. If the neurons have seen it earlier, the label “traffic light” will be activated. The more labeled images, the neurons experience, the stronger the validation becomes, the process being called training neurons.

One cannot use fully connected networks in Convolutional Neural Networks because any generic image has at least a 200x200x3 pixel size where the first hidden layer becomes as much as 120,000. Considering other layers, a very large number of neurons will be needed to process the entire image set.

OVERFITTING AND UNDERFITTING

Overfitting

For instance, you are visiting a foreign country and the taxi driver rips you off. You might be tempted to say that all the taxi drivers there are thieves, this tendency, usually called overgeneralization, is quite common in humans but machines can also fall into the same trap if not taken care of. In machine learning, this is called overfitting. It indicates the model performs well on the training data, but it does not generalize well.

To address overfitting, techniques such as dropout, early stopping, and regularization can be used to prevent the model from fitting the training data too closely. Dropout involves randomly dropping out nodes in the neural network during training to prevent co-adaptation of neurons.

Underfitting

Underfitting is the opposite of overfitting, i.e. it occurs when your model is too simple to learn the underlying structure of the data. For example, a linear model of life satisfaction is prone to underfit: reality is just more complex than the model, so its predictions are bound to be inaccurate, even in the training examples.

To address underfitting, the model architecture can be made more complex by adding more layers, and neurons, or using a different activation function. Additionally, more data can be gathered to improve the model’s ability to capture the underlying patterns.

HISTORY OF CNN

Although the artificial neural networks research was done in the 1960s by Rosenblatt, it took almost more than 40 years to introduce deep learning using neural networks. The key point taken into consideration was the computation power and datasets with Google instigating research into deep learning. Finally, in July 2012, Google researchers exposed an advanced neural network to a series of unlabelled, static images sliced from youtube videos where the network learned a cat-detecting neuron on its own.

The CNN model works on a four-layered concept:

Convolution
ReLu
Pooling
Full Connectedness

Four-layered Concept

I. Convolution of Image:

It is translational invariant i.e. each convolution filter represents a feature of interest and the CNN algorithm can recognize the feature comprising the alphabet. There are four steps for convolution:

Line up the feature and image.
Multiply each pixel with its feature.
Add the values obtained.
Divide the sum by the total pixels in the feature.

A convolution in mathematical terms is the product of functions f and g which are objects of Schwartz functions in Rⁿ over a finite range [0,t] defined as:

F(t)*G(t) = ∫F(t-x)dG(x)

The reasons to convolve an image are to smoothen, sharpen, intensify, and enhance the image.

II. ReLu Layer:

Rectified Linear Unit or ReLu is an activation function that only activates a particular node if the input crosses a certain number. When the input is zero, the output is zero while it increases linearly with the dependable variable with the rise in input over a threshold. The function f(x) defining ReLu is equal to the maximum of the zero and the dependable variable.

f(x)=max(0,x)

The main aim of ReLu is to remove the negative values from the convolution. While all the positive remains unchanged, the negative ones are changed to zero (as shown in the figure).

III. Pooling Layer:

Here, the image stack shrinks to a smaller size. It is done after passing through the activation layer by implementing the following four steps:

Pick a window size of the matrix (2 or 3).
Pick a stride (usually 2)
Walk your window across the filtered images.
From each window, take the maximum value.

IV. Full Connectedness:

In this layer, the neurons are fully connected to all activations in the previous layers. they are always placed at the end of the network i.e. no convolution layer following a fully connected one. It is a common practice to use one or two FC layers before applying to the softmax classifier as shown below:

INPUT => CONV => RELU => POOL => CONV => RELU => POOL => FC => FC

CNN MODEL FOR CIFAR-10 PHOTO CLASSIFICATION USING PYTHON

CIFAR-10:

It stands for the Canadian Institute For Advanced Research. It is a dataset of 60,000 32*32 pixel color photographs of objects from 10 classes. Each class has an integer value associated with it:

0: airplane
1: automobile
2: bird
3: cat
4: deer
5: dog
6: frog
7: horse
8: ship
9: truck

Load CIFAR-10 using Keras API

The example below loads the CIFAR-10 and creates a plot of the first nine images in the training set.

from matplotlib import pyplot
from keras.datasets import cifar10

# load dataset
(trainX, trainy), (testX, testy) = cifar10.load_data()

# summarize loaded dataset
print('Train: X=%s, y=%s' % (trainX.shape, trainy.shape))
print('Test: X=%s, y=%s' % (testX.shape, testy.shape))

# plot first few images
for i in range(9):
 # define subplot
 pyplot.subplot(330 + 1 + i)
 # plot raw pixel data
 pyplot.imshow(trainX[i])

# show the figure
pyplot.show()

Let’s run the example to load the CIFAR-10 train and test dataset and print the shape.

Train: X=(50000, 32, 32, 3), y=(50000, 1)
Test: X=(10000, 32, 32, 3), y=(10000, 1)

MODEL EVALUATION TEST HARNESS

Load Dataset

#load dataset
(trainX, trainY), (testX, testY) = cifar10.load_data()

#one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)

Prepare Pixel Data

#integers to float values
train_norm = train.astype('float32')
test_norm = test.astype('float32')

#normalise the range 0-1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0

Define Model

#define cnn model
def def_model():
  model = Sequential()
  return model

Evaluate Model

# fit model
history = model.fit(trainX, trainY, epochs=100, batch_size=64, validation_data=(testX, testY), verbose=0)

#evaluate model
_, acc = model.evaluate(testX, testY, verbose=0)

Show the result

# plot diagnostic learning curves
def summarize_diagnostics(history):

 # plot loss
 pyplot.subplot(211)
 pyplot.title('Cross Entropy Loss')
 pyplot.plot(history.history['loss'], color='blue', label='train')
 pyplot.plot(history.history['val_loss'], color='orange', label='test')

 # plot accuracy
 pyplot.subplot(212)
 pyplot.title('Classification Accuracy')
 pyplot.plot(history.history['accuracy'], color='blue', label='train')
 pyplot.plot(history.history['val_accuracy'], color='orange', label='test')

 # save plot to file
 filename = sys.argv[0].split('/')[-1]
 pyplot.savefig(filename + '_plot.png')
 pyplot.close()

Now, one can also report the final model performance on the test dataset. This can be achieved by printing the classification accuracy directly.

print('> %.3f' % (acc * 100.0))

COMPLETE CODE

Given below is the complete code of the model trained:

# test harness for evaluating models on the cifar10 dataset
import sys
from matplotlib import pyplot
from keras.datasets import cifar10
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.optimizers import SGD

# load train and test dataset
def load_dataset():
 # load dataset
 (trainX, trainY), (testX, testY) = cifar10.load_data()
 # one hot encode target values
 trainY = to_categorical(trainY)
 testY = to_categorical(testY)
 return trainX, trainY, testX, testY

# scale pixels
def prep_pixels(train, test):
 # convert from integers to floats
 train_norm = train.astype('float32')
 test_norm = test.astype('float32')
 # normalize to range 0-1
 train_norm = train_norm / 255.0
 test_norm = test_norm / 255.0
 # return normalized images
 return train_norm, test_norm

# define cnn model
def define_model():
 model = Sequential()
 # ...
 return model

# plot diagnostic learning curves
def summarize_diagnostics(history):
 # plot loss
 pyplot.subplot(211)
 pyplot.title('Cross Entropy Loss')
 pyplot.plot(history.history['loss'], color='blue', label='train')
 pyplot.plot(history.history['val_loss'], color='orange', label='test')
 # plot accuracy
 pyplot.subplot(212)
 pyplot.title('Classification Accuracy')
 pyplot.plot(history.history['accuracy'], color='blue', label='train')
 pyplot.plot(history.history['val_accuracy'], color='orange', label='test')
 # save plot to file
 filename = sys.argv[0].split('/')[-1]
 pyplot.savefig(filename + '_plot.png')
 pyplot.close()

# run the test harness for evaluating a model
def run_test_harness():
 # load dataset
 trainX, trainY, testX, testY = load_dataset()
 # prepare pixel data
 trainX, testX = prep_pixels(trainX, testX)
 # define model
 model = define_model()
 # fit model
 history = model.fit(trainX, trainY, epochs=100, batch_size=64, validation_data=(testX, testY), verbose=0)
 # evaluate model
 _, acc = model.evaluate(testX, testY, verbose=0)
 print('> %.3f' % (acc * 100.0))
 # learning curves
 summarize_diagnostics(history)

# entry point, run the test harness
run_test_harness()

This test harness can evaluate any CNN model, one wishes to, on the CIFAR-10 dataset and can run on GPU or CPU. But as in, this can’t be run on the computer. So below is another example that first loads and prepares the image, loads the model, and then correctly predicts that the loaded image is of class ‘4’.

# make a prediction for a new image.
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.models import load_model

# load and prepare the image
def load_image(filename):
 # load the image
 img = load_img(filename, target_size=(32, 32))
 # convert to array
 img = img_to_array(img)
 # reshape into a single sample with 3 channels
 img = img.reshape(1, 32, 32, 3)
 # prepare pixel data
 img = img.astype('float32')
 img = img / 255.0
 return img

# load an image and predict the class
def run_example():
 # load the image
 img = load_image('sample_image.png')
 # load model
 model = load_model('final_model.h5')
 # predict the class
 result = model.predict_classes(img)
 print(result[0])

# entry point, run the example
run_example()

Output

CONCLUSION

CNN is a popular deep-learning technique for the current visual recognition tasks. These are dependent on the size and quality of the training sets.
this neural network is capable of surpassing humans at visual recognition tasks, however, these are still not robust to visual artifacts such as glare and noise, that humans are capable of.
The theory is still being developed and researchers are working to endow it with numerous properties like active attention and online memory etc.