Neural Network in Practice with Python and Keras

by Alex
Neural Network in Practice with Python and Keras

What is machine learning and why is it important?

Machine learning is a field of artificial intelligence that uses statistical methods to give computer systems the ability to “learn”. That is, to gradually improve performance in a particular task, using data without explicit programming. A good example is how effectively (or not so effectively) Gmail recognizes spam, or how much better voice recognition systems have become with the arrival of Siri, Alex, and Google Home. Machine learning is used to solve the following problems:

  • Fraud detection – tracking unusual patterns in credit card or bank account transactions
  • Prediction – predicting the future price of stocks, currency exchange rates or cryptocurrencies
  • Image recognition – identifying objects and faces in pictures

Machine learning is a huge field, and today we’ll talk about just one part of it.

Learning with a teacher

Learning with a teacher is one type of machine learning. Its idea is that the system is first taught to understand past data by offering many examples of a particular problem and the desired output. Then, when the system is “trained,” it can be given new input data to predict the output. For example, how do you create a spam detector? One way is through intuition. You can manually define rules, such as “contains the word money” or “includes the phrase Western Union”. And even though such systems sometimes work, most of the time it’s still hard to create or define patterns based solely on intuition. With teacher training, you can train systems to learn the underlying rules and patterns by providing examples with lots of spam. When such a detector is trained, it can be given a new email to try to predict if it is spam. Training with a teacher can be used to predict the output. There are two types of problems that are solved with it: regression and classification.

  • In regression problems we try to predict continuous output. For example, predicting the price of a house based on data about its size
  • In classification problems, we predict a discrete number of qualitative labels. For example, trying to predict whether an email is spam based on the number of words in it.

You can’t talk about machine learning with a teacher without touching the model of learning with a teacher. It’s like talking about programming without touching programming languages or data structures. Learning models are the very structures that lend themselves to training. Their weight (or structure) changes as they form an understanding of what needs to be predicted. There are several kinds of learning models, such as:

  • Random forest
  • Naive Bayesian classifier (naive Bayes)
  • Logistic regression
  • The k nearest neighbors method

This material will use a neural network as a model.

Understanding how neural networks work

Neural networks are so named because their internal structure is supposed to mimic the human brain. The latter consists of neurons and the synapses that connect them. At the moment of stimulation, neurons “activate” others by means of electricity. Each neuron is “activated” in the first place by calculating a weighted sum of inputs and the subsequent result with the resultant function. When a neuron is activated, it in turn activates the others, which perform similar calculations, causing a chain reaction between all neurons of all layers. It is worth noting that even though neural networks are inspired by biological networks, they still cannot be compared.

  • This diagram illustrates the activation process that each neuron goes through. Consider the diagram from left to right.
  • All inputs (numeric values) from incoming neurons are read. They are defined as x1…xn.
  • Each input is multiplied by the weighted sum associated with that connection. The associated weights are denoted as W1j…Wnj.
  • All weighted inputs are summed and passed to the activating function. It reads this input and transforms it into the numerical value of k nearest neighbors.
  • As a result, the numerical value returned by this function will be the input for another neuron in another layer.

Layers of a neural network

Neurons within a neural network are organized into layers. Layers are a way of creating a structure where each contains 1 or more neurons. A neural network usually has 3 or more layers. Also 2 special layers are always defined, which act as input and output.

  • The input layer is the entry point into the neural network. In terms of programming it can be thought of as a function argument.
  • The output is the result of the neural network. In programming terms it is the value returned by the function.

The layers between them are described as “hidden layers”. This is where all the calculations take place. All layers in the neural network are encoded as feature descriptions.

Choosing the number of hidden layers and neurons

There is no golden rule to follow when choosing the number of layers and their size (or the number of neurons). As a rule, it’s worth to try at least 1 such layer and then adjust the size, checking what works best.

Using Keras library to train a simple neural network that recognizes handwritten numbers

Python programmers don’t need to reinvent the wheel. Libraries such as Tensorflow, Torch, Theano, and Keras have already defined basic data structures for the neural network, leaving only the need to describe the structure of the neural network declaratively. Keras also provides some freedom: it is possible to choose the number of layers, the number of neurons, the type of layer and the activation function. In practice there are quite a lot of elements, but this time we will do with simpler examples. As already mentioned, there are two special layers that must be defined based on a particular problem: the size of the input layer and the size of the output layer. All other “hidden layers” are used to explore the complex nonlinear abstractions of the problem. In this contribution, we will use Python and the Keras library to predict handwritten numbers from the MNIST database.

Running Jupyter Notebook locally

If you haven’t worked with Jupyter Notebook yet, check out the Jupyter Notebook Beginner’s Guide first List of required libraries:

  • numpy
  • matplotlib
  • sklearn
  • tensorflow

Running from the Python interpreter

To run a clean installation of Python (any version older than 3.6), install the required modules using pip.

I recommend (but not obligatory) that you run the code in a virtual environment.

!pip install matplotlib 
!pip install sklearn 
!pip install tensorflow

If these modules are installed, you can now run all the code in the project. Import modules and libraries:

import numpy as np
import matplotlib.pyplot as plt
import gzip
from typing import List
from sklearn.preprocessing import OneHotEncoder
import tensorflow.keras as keras
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import itertools

MNIST database

MNIST is a huge database of handwritten numbers that is used as a benchmark and an introduction point to machine learning and image processing systems. It is ideally suited to focus specifically on the neural network learning process. MNIST is a very clean database, which is a luxury in the world of machine learning.

The goal of

To train the system, classify each one with the appropriate label (the pictured figure). Using a dataset of 60,000 handwritten digit images (represented as 28×28 pixel images, each of which is a grayscale of 0 to 255).

Data Set

The dataset consists of training and test data, but for simplicity only the training data will be used here. Here’s how to load it:


rm -Rftrain-images-idx3-ubyte.gz
rm -Rftrain-labels-idx1-ubyte.gz
wget -q http://yann.lecun. com/exdb/mnist/train-images-idx3-ubyte.gz
wget -q http://yann.lecun. com/exdb/mnist/train-labels-idx1-ubyte.gz

Reading Labels

There are 10 digits: (0-9), so each label must be a digit from 0 to 9. The downloaded file, train-labels-idx1-ubyte.gz, encodes the labels as follows: Training Set Label File (train-labels-idx1-ubyte):

[offset] [type] [value] [description]
0000 32 bit integer 0x00000801(2049) magic number (MSB first)
0004 32 bit integer 60000 number of items
0008 unsigned byte ?? label
0009 unsigned byte ?? label
….. ….. ….. …..
xxxx unsigned byte ?? label

Label values from 0 to 9. The first 8 bytes (or the first 2 32-bit integers) can be skipped because they contain the metadata files needed for low-level programming languages. To parse a file you need to do the following operations:

  • Open the file with the gzip library so it can be decompressed
  • Read the entire array of bytes into memory
  • Skip the first 8 bytes
  • Go through each byte and cast it to an integer

Note: if this file is from an untested source, a lot more checks will be needed. But let’s assume that this particular one is reliable and suitable for the purpose of the material.

with'train-labels-idx1-ubyte.gz') as train_labels:
    data_from_train_file =

# skip first 8 bytes
label_data = data_from_train_file[8:]
assert len(label_data) == 60000

# Convert each byte to an integer 
 # This will be a number between 0 and 9
labels = [int(label_byte) for label_byte in label_data]
assert min(labels) == 0 and max(labels) == 9
assert len(labels) == 60000

Reading Images

[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
0017 unsigned byte ?? pixel
….. ….. ….. …..
xxxx unsigned byte ?? pixel

Reading images is slightly different from reading labels. The first 16 bytes contain already known metadata. You can skip them and go straight to reading images. Each of them is represented as a 28*28 array of unsigned bytes. All that is required is to read one image at a time and store them in the array.

images = []

# Go through the training file and read one image at a time
with'train-images-idx3-ubyte.gz') as train_images: * 4)
    ctr = 0
   for _ in range(60000):
        image =
       assert len(image) == SIZE_OF_ONE_IMAGE
       # convert to NumPy
       image_np = np.frombuffer(image, dtype='uint8') / 255

images = np.array(images)

Output: (60000, 784) There are 60000 images in the list. Each of them is represented by a bit vector of size SIZE_OF_ONE_IMAGE. Let’s try to build an image using the matplotlib library:

def plot_image(pixels: np.array):
    plt.imshow(pixels.reshape((28, 28)), cmap='gray')


Encoding image labels with one-hot encoding

We will use one-hot encoding to turn the target labels into a vector.

labels_np = np.array(labels).reshape((-1, 1))

encoder = OneHotEncoder(categories='auto')
labels_np_onehot = encoder.fit_transform(labels_np).toarray()

array([[0., 0., 0., ..., 0., 0., 0,]
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.]])

We have successfully created input data and vector output, which will go to the input and output layers of the neural network. The input vector with index i will correspond to the output vector with index i. Input data:



array([0., 0., 0., 0., 0., 0., 1., 0., 0., 0.])



Conclusion: In the example above you can clearly see that the image with the index 999 represents the digit 6. The vector associated with it contains 10 digits (because there are 10 labels), and the digit with index 6 is 1. This means that the label is correct.

Dividing dataset into training and test one

To check that the neural network has been trained correctly, we take a certain percentage of the training set (60,000 images) and use it for testing purposes. Input data:

X_train, X_test, y_train, y_test = train_test_split(images, labels_np_onehot)
(45000, 10)
(15000, 10)

Here you can see that the entire set of 60,000 ball images is split into two: one with 45,000 images and one with 15,000 images.

Training the neural network with Keras

model = keras.Sequential()
model.add(keras.layers.Dense(input_shape=(SIZE_OF_ONE_IMAGE,), units=128, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))




Model: "sequential"
Layer (type) Output Shape Param # 
dense (Dense) (None, 128) 100480    
dense_1 (Dense) (None, 10) 1290    
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0

To train the neural network, run this code., y_train, epochs=20, batch_size=128)


Train on 45000 samples
Epoch 1/20
45000/45000 [==============================] - 2s 54us/sample - loss: 1.3391 - accuracy: 0.6710
Epoch 2/20
45000/45000 [==============================] - 2s 39us/sample - loss: 0.6489 - accuracy: 0.8454
Epoch 20/20
45000/45000 [==============================] - 2s 40us/sample - loss: 0.2584 - accuracy: 0.9279

We check the accuracy on the training data.

model.evaluate(X_test, y_test)


[0.2567395991722743, 0.9264]

Let’s see the results

So you have trained your neural network to predict handwritten numbers with an accuracy of over 90%. Let’s test it with the image from the test set. Let’s take a random image, an image with index 1010. We take the predicted mark (in this case it is 4, because the fifth position has the digit 1)


Conclusion: array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.]) Let’s build an image of the corresponding picture


Understanding the output of the activation layer

Let’s run the figure through the neural network and see what kind of output it will predict. Input data:

predicted_results = model.predict(X_test[1010].reshape((1,-1)))

The output of the softmax layer is the probability distribution for each output. In this case there can be 10 of them (numbers from 0 to 9). But each image is expected to match only one. Since this is a probability distribution, their sum is approximately 1 (one).


Reading the output of the softmax layer for a particular digit

As you can see further, the 5th index is indeed close to 1 (0.99), which means that it is very likely to be 4… and it is!

array([[1.2202066e-06, 3.4432333e-08, 3.5151488e-06, 1.2011528e-06, 9.9889344e-01, 3.5855610e-05, 1.6140550e-05, 7.6822333e-05, 1.0446112e-04, 8.6736667e-04]], dtype=float32)

Viewing the error matrix

predicted_outputs = np.argmax(model.predict(X_test), axis=1)
expected_outputs = np.argmax(y_test, axis=1)

predicted_confusion_matrix = confusion_matrix(expected_outputs, predicted_outputs)
array([[1402, 0, 4, 3, 1, 6, 20, 2, 21, 2],
       [ 1, 1684, 9, 5, 4, 9, 1, 3, 9, 3],
       [ 13, 8, 1280, 9, 19, 5, 12, 15, 17, 8],
       [ 6, 8, 37, 1404, 1, 53, 3, 17, 33, 15],
       [ 4, 7, 8, 0, 1345, 1, 18, 3, 8, 54],
       [ 17, 8, 9, 31, 25, 1157, 25, 3, 24, 12],
       [ 9, 6, 10, 0, 10, 12, 1431, 0, 6, 1],
       [ 3, 11, 17, 4, 23, 2, 1, 1484, 5, 40],
       [ 11, 16, 24, 40, 9, 25, 13, 3, 1348, 25],
       [ 5, 5, 6, 16, 31, 6, 0, 43, 7, 1381]],

Visualizing the data

# this is the code from

def plot_confusion_matrix(cm, classes,
title=‘Confusion matrix’,
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.

plt.imshow(cm, interpolation=‘nearest’, cmap=cmap)
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)

fmt = ‘d’
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1]):
plt.text(j, i, format(cm[i, j], fmt),
color=“white” if cm[i, j] >resh else “black”)

plt.ylabel(‘True label’)
plt.xlabel(‘Predicted label’)

# Compute confusion matrix
class_names = [str(idx) for idx in range(10)]
cnf_matrix = confusion_matrix(expected_outputs, predicted_outputs)

# Plot non-normalized confusion matrix
plot_confusion_matrix(cnf_matrix, classes=class_names,
title=‘Confusion matrix, without normalization’)


Over the course of this tutorial, you should have understood the basic concepts that form the basis of machine learning, and you should have learned:

  • Encode and decode the images in the MNIST dataset
  • Encode categorical values using “one-hot encoding”
  • Define a neural network with two hidden layers as well as an output layer using the softmax activation function
  • Examine the results of softmax activation function output
  • Build a classifier error matrix

Sci-Kit Learn and Keras libraries have significantly lowered the threshold of entry into machine learning – just as Python lowered the threshold of familiarity with programming. However, it will take years (or decades) to reach expert level! Programmers with machine learning skills are in high demand. With the help of the mentioned libraries and introductory material about the practical aspects of machine learning, everyone should have the opportunity to become familiar with this area of knowledge. Even if there is no theoretical knowledge of the model, library, or framework. Then the skills need to be put into practice, developing smarter products that will make consumers more engaged.

Try it yourself

Here’s what you can try to do yourself to delve deeper into the world of machine learning with Python:

  • Experiment with the number of neurons in the hidden layer. Can you increase the accuracy?
  • Try adding more layers. Does it make the network train slower? Do you understand why?
  • Try RandomForestClassifier (you need the scikit-learn library) instead of the neural network. Has the accuracy increased?

Related Posts