What is machine learning and why is it important?
- Fraud detection – tracking unusual patterns in credit card or bank account transactions
- Prediction – predicting the future price of stocks, currency exchange rates or cryptocurrencies
- Image recognition – identifying objects and faces in pictures
Machine learning is a huge field, and today we’ll talk about just one part of it.
Learning with a teacher
Learning with a teacher is one type of machine learning. Its idea is that the system is first taught to understand past data by offering many examples of a particular problem and the desired output. Then, when the system is “trained,” it can be given new input data to predict the output. For example, how do you create a spam detector? One way is through intuition. You can manually define rules, such as “contains the word money” or “includes the phrase Western Union”. And even though such systems sometimes work, most of the time it’s still hard to create or define patterns based solely on intuition. With teacher training, you can train systems to learn the underlying rules and patterns by providing examples with lots of spam. When such a detector is trained, it can be given a new email to try to predict if it is spam. Training with a teacher can be used to predict the output. There are two types of problems that are solved with it: regression and classification.
- In regression problems we try to predict continuous output. For example, predicting the price of a house based on data about its size
- In classification problems, we predict a discrete number of qualitative labels. For example, trying to predict whether an email is spam based on the number of words in it.
You can’t talk about machine learning with a teacher without touching the model of learning with a teacher. It’s like talking about programming without touching programming languages or data structures. Learning models are the very structures that lend themselves to training. Their weight (or structure) changes as they form an understanding of what needs to be predicted. There are several kinds of learning models, such as:
- Random forest
- Naive Bayesian classifier (naive Bayes)
- Logistic regression
- The k nearest neighbors method
This material will use a neural network as a model.
Understanding how neural networks work
Neural networks are so named because their internal structure is supposed to mimic the human brain. The latter consists of neurons and the synapses that connect them. At the moment of stimulation, neurons “activate” others by means of electricity. Each neuron is “activated” in the first place by calculating a weighted sum of inputs and the subsequent result with the resultant function. When a neuron is activated, it in turn activates the others, which perform similar calculations, causing a chain reaction between all neurons of all layers. It is worth noting that even though neural networks are inspired by biological networks, they still cannot be compared.
- This diagram illustrates the activation process that each neuron goes through. Consider the diagram from left to right.
- All inputs (numeric values) from incoming neurons are read. They are defined as x1…xn.
- Each input is multiplied by the weighted sum associated with that connection. The associated weights are denoted as W1j…Wnj.
- All weighted inputs are summed and passed to the activating function. It reads this input and transforms it into the numerical value of k nearest neighbors.
- As a result, the numerical value returned by this function will be the input for another neuron in another layer.
Layers of a neural network
Neurons within a neural network are organized into layers. Layers are a way of creating a structure where each contains 1 or more neurons. A neural network usually has 3 or more layers. Also 2 special layers are always defined, which act as input and output.
- The input layer is the entry point into the neural network. In terms of programming it can be thought of as a function argument.
- The output is the result of the neural network. In programming terms it is the value returned by the function.
The layers between them are described as “hidden layers”. This is where all the calculations take place. All layers in the neural network are encoded as feature descriptions.
Choosing the number of hidden layers and neurons
There is no golden rule to follow when choosing the number of layers and their size (or the number of neurons). As a rule, it’s worth to try at least 1 such layer and then adjust the size, checking what works best.
Using Keras library to train a simple neural network that recognizes handwritten numbers
Python programmers don’t need to reinvent the wheel. Libraries such as Tensorflow, Torch, Theano, and Keras have already defined basic data structures for the neural network, leaving only the need to describe the structure of the neural network declaratively. Keras also provides some freedom: it is possible to choose the number of layers, the number of neurons, the type of layer and the activation function. In practice there are quite a lot of elements, but this time we will do with simpler examples. As already mentioned, there are two special layers that must be defined based on a particular problem: the size of the input layer and the size of the output layer. All other “hidden layers” are used to explore the complex nonlinear abstractions of the problem. In this contribution, we will use Python and the Keras library to predict handwritten numbers from the MNIST database.
Running Jupyter Notebook locally
If you haven’t worked with Jupyter Notebook yet, check out the Jupyter Notebook Beginner’s Guide first List of required libraries:
Running from the Python interpreter
To run a clean installation of Python (any version older than 3.6), install the required modules using pip.
I recommend (but not obligatory) that you run the code in a virtual environment.
!pip install matplotlib !pip install sklearn !pip install tensorflow
If these modules are installed, you can now run all the code in the project. Import modules and libraries:
import numpy as np import matplotlib.pyplot as plt import gzip from typing import List from sklearn.preprocessing import OneHotEncoder import tensorflow.keras as keras from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix import itertools %matplotlibinline
MNIST is a huge database of handwritten numbers that is used as a benchmark and an introduction point to machine learning and image processing systems. It is ideally suited to focus specifically on the neural network learning process. MNIST is a very clean database, which is a luxury in the world of machine learning.
The goal of
To train the system, classify each one with the appropriate label (the pictured figure). Using a dataset of 60,000 handwritten digit images (represented as 28×28 pixel images, each of which is a grayscale of 0 to 255).
The dataset consists of training and test data, but for simplicity only the training data will be used here. Here’s how to load it:
%%bash rm -Rftrain-images-idx3-ubyte.gz rm -Rftrain-labels-idx1-ubyte.gz wget -q http://yann.lecun. com/exdb/mnist/train-images-idx3-ubyte.gz wget -q http://yann.lecun. com/exdb/mnist/train-labels-idx1-ubyte.gz
There are 10 digits: (0-9), so each label must be a digit from 0 to 9. The downloaded file,
train-labels-idx1-ubyte.gz, encodes the labels as follows: Training Set Label File (train-labels-idx1-ubyte):
|0000||32 bit integer||0x00000801(2049)||magic number (MSB first)|
|0004||32 bit integer||60000||number of items|
Label values from 0 to 9. The first 8 bytes (or the first 2 32-bit integers) can be skipped because they contain the metadata files needed for low-level programming languages. To parse a file you need to do the following operations:
- Open the file with the gzip library so it can be decompressed
- Read the entire array of bytes into memory
- Skip the first 8 bytes
- Go through each byte and cast it to an integer
Note: if this file is from an untested source, a lot more checks will be needed. But let’s assume that this particular one is reliable and suitable for the purpose of the material.
with gzip.open('train-labels-idx1-ubyte.gz') as train_labels: data_from_train_file = train_labels.read() # skip first 8 bytes label_data = data_from_train_file[8:] assert len(label_data) == 60000 # Convert each byte to an integer # This will be a number between 0 and 9 labels = [int(label_byte) for label_byte in label_data] assert min(labels) == 0 and max(labels) == 9 assert len(labels) == 60000
|0000||32 bit integer||0x00000803(2051)||magic number|
|0004||32 bit integer||60000||number of images|
|0008||32 bit integer||28||number of rows|
|0012||32 bit integer||28||number of columns|
Reading images is slightly different from reading labels. The first 16 bytes contain already known metadata. You can skip them and go straight to reading images. Each of them is represented as a 28*28 array of unsigned bytes. All that is required is to read one image at a time and store them in the array.
SIZE_OF_ONE_IMAGE = 28 ** 2 images =  # Go through the training file and read one image at a time with gzip.open('train-images-idx3-ubyte.gz') as train_images: train_images.read(4 * 4) ctr = 0 for _ in range(60000): image = train_images.read(size=SIZE_OF_ONE_IMAGE) assert len(image) == SIZE_OF_ONE_IMAGE # convert to NumPy image_np = np.frombuffer(image, dtype='uint8') / 255 images.append(image_np) images = np.array(images) images.shape
(60000, 784) There are 60000 images in the list. Each of them is represented by a bit vector of size
SIZE_OF_ONE_IMAGE. Let’s try to build an image using the
def plot_image(pixels: np.array): plt.imshow(pixels.reshape((28, 28)), cmap='gray') plt.show() plot_image(images)
Encoding image labels with one-hot encoding
We will use one-hot encoding to turn the target labels into a vector.
labels_np = np.array(labels).reshape((-1, 1)) encoder = OneHotEncoder(categories='auto') labels_np_onehot = encoder.fit_transform(labels_np).toarray() labels_np_onehot
array([[0., 0., 0., ..., 0., 0., 0,] [1., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 1., 0.]])
We have successfully created input data and vector output, which will go to the input and output layers of the neural network. The input vector with index
i will correspond to the output vector with index
i. Input data:
array([0., 0., 0., 0., 0., 0., 1., 0., 0., 0.])
Conclusion: In the example above you can clearly see that the image with the index 999 represents the digit 6. The vector associated with it contains 10 digits (because there are 10 labels), and the digit with index 6 is 1. This means that the label is correct.
Dividing dataset into training and test one
To check that the neural network has been trained correctly, we take a certain percentage of the training set (60,000 images) and use it for testing purposes. Input data:
X_train, X_test, y_train, y_test = train_test_split(images, labels_np_onehot) print(y_train.shape) print(y_test.shape)
(45000, 10) (15000, 10)
Here you can see that the entire set of 60,000 ball images is split into two: one with 45,000 images and one with 15,000 images.
Training the neural network with Keras
model = keras.Sequential() model.add(keras.layers.Dense(input_shape=(SIZE_OF_ONE_IMAGE,), units=128, activation='relu')) model.add(keras.layers.Dense(10, activation='softmax')) model.summary() model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 128) 100480 _________________________________________________________________ dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 101,770 Trainable params: 101,770 Non-trainable params: 0 _________________________________________________________________
To train the neural network, run this code.
model.fit(X_train, y_train, epochs=20, batch_size=128)
Train on 45000 samples Epoch 1/20 45000/45000 [==============================] - 2s 54us/sample - loss: 1.3391 - accuracy: 0.6710 Epoch 2/20 45000/45000 [==============================] - 2s 39us/sample - loss: 0.6489 - accuracy: 0.8454 ... Epoch 20/20 45000/45000 [==============================] - 2s 40us/sample - loss: 0.2584 - accuracy: 0.9279
We check the accuracy on the training data.
Let’s see the results
So you have trained your neural network to predict handwritten numbers with an accuracy of over 90%. Let’s test it with the image from the test set. Let’s take a random image, an image with index 1010. We take the predicted mark (in this case it is 4, because the fifth position has the digit 1)
array([0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.]) Let’s build an image of the corresponding picture
Understanding the output of the activation layer
Let’s run the figure through the neural network and see what kind of output it will predict. Input data:
predicted_results = model.predict(X_test.reshape((1,-1)))
The output of the
softmax layer is the probability distribution for each output. In this case there can be 10 of them (numbers from 0 to 9). But each image is expected to match only one. Since this is a probability distribution, their sum is approximately 1 (one).
Reading the output of the softmax layer for a particular digit
As you can see further, the 5th index is indeed close to 1 (0.99), which means that it is very likely to be 4… and it is!
array([[1.2202066e-06, 3.4432333e-08, 3.5151488e-06, 1.2011528e-06, 9.9889344e-01, 3.5855610e-05, 1.6140550e-05, 7.6822333e-05, 1.0446112e-04, 8.6736667e-04]], dtype=float32)
Viewing the error matrix
predicted_outputs = np.argmax(model.predict(X_test), axis=1) expected_outputs = np.argmax(y_test, axis=1) predicted_confusion_matrix = confusion_matrix(expected_outputs, predicted_outputs) predicted_confusion_matrix
array([[1402, 0, 4, 3, 1, 6, 20, 2, 21, 2], [ 1, 1684, 9, 5, 4, 9, 1, 3, 9, 3], [ 13, 8, 1280, 9, 19, 5, 12, 15, 17, 8], [ 6, 8, 37, 1404, 1, 53, 3, 17, 33, 15], [ 4, 7, 8, 0, 1345, 1, 18, 3, 8, 54], [ 17, 8, 9, 31, 25, 1157, 25, 3, 24, 12], [ 9, 6, 10, 0, 10, 12, 1431, 0, 6, 1], [ 3, 11, 17, 4, 23, 2, 1, 1484, 5, 40], [ 11, 16, 24, 40, 9, 25, 13, 3, 1348, 25], [ 5, 5, 6, 16, 31, 6, 0, 43, 7, 1381]], dtype=int64)
Visualizing the data
# this is the code from https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
def plot_confusion_matrix(cm, classes,
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
plt.imshow(cm, interpolation=‘nearest’, cmap=cmap)
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
fmt = ‘d’
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape), range(cm.shape):
plt.text(j, i, format(cm[i, j], fmt),
color=“white” if cm[i, j] >resh else “black”)
# Compute confusion matrix
class_names = [str(idx) for idx in range(10)]
cnf_matrix = confusion_matrix(expected_outputs, predicted_outputs)
# Plot non-normalized confusion matrix
title=‘Confusion matrix, without normalization’)
Over the course of this tutorial, you should have understood the basic concepts that form the basis of machine learning, and you should have learned:
- Encode and decode the images in the MNIST dataset
- Encode categorical values using “one-hot encoding”
- Define a neural network with two hidden layers as well as an output layer using the softmax activation function
- Examine the results of softmax activation function output
- Build a classifier error matrix
Sci-Kit Learn and Keras libraries have significantly lowered the threshold of entry into machine learning – just as Python lowered the threshold of familiarity with programming. However, it will take years (or decades) to reach expert level! Programmers with machine learning skills are in high demand. With the help of the mentioned libraries and introductory material about the practical aspects of machine learning, everyone should have the opportunity to become familiar with this area of knowledge. Even if there is no theoretical knowledge of the model, library, or framework. Then the skills need to be put into practice, developing smarter products that will make consumers more engaged.
Try it yourself
Here’s what you can try to do yourself to delve deeper into the world of machine learning with Python:
- Experiment with the number of neurons in the hidden layer. Can you increase the accuracy?
- Try adding more layers. Does it make the network train slower? Do you understand why?
- Try RandomForestClassifier (you need the scikit-learn library) instead of the neural network. Has the accuracy increased?