How To Detect Pneumonia using Convolutional Neural Network (CNN)

11 min readJul 27, 2023

Part #01: Introduction

Pneumonia is inflammation of the lungs caused by infection. Pneumonia can cause mild to severe symptoms. Some of the common symptoms experienced by people with pneumonia are coughing up phlegm, fever and shortness of breath. This disease is one of the highest causes of death in children worldwide according to WHO in 2019. In this session, we will learn how to predict pneumonia symptoms based on patient X-ray data.

The dataset in this case study contains a collection of X-ray images of patients who have been differentiated between normal and those with pneumonia. Our task is to create a deep learning model that can predict whether someone has pneumonia or not. So, let’s go to the next stage.

Part #02: Load Dataset

You can access the dataset that we use via the following link: https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia. Please download it first to be able to go to the next step or do it directly on Kaggle. The first step you can take is to import all the required libraries.

I worked on this project directly on Kaggle, if you are using jupyter notebook, it is recommended to download the libraries first.

Next, we create a function to read the images in the dataset. The script in figure 2 we use to read the directory where we store the dataset and read the images in it. The process to read the pictures starts from lines 4–12. The 7th line is used to read the image while the 9th line is used to resize the image to 200 x 200px. It should be noted that the original size of the images in this dataset is 2,000 x 1858 px which, if not resized, will burden our machine in the future. Therefore, we minimize the file so that it lightens the workload of your laptop a little.

After we create the function, we can load our dataset. We have 2 folder, train and test while on each folder we have sub foldar namely NORMAL and PNEUMONIA.

If we run the code in figure 3 and figure 4. The result is slightly different. Both of them showing the sum of data that we have on each folder. But, we got more information in figure 4 because it showing the dimension (200,200) also the color of the image (3). The number 3 of the colors is representation of RGB. You can see the detail of the explanation in figure 5.

You can see the sample image of ‘normal’ and ‘pneumonia’ by just type the code in figure 6 and figure 7.

Result of the code above will showing 8 images of normal x-ray and pneumonia x-ray. As I’m don’t have much knowledge about healthy area, I just assuming that the patient who got pneumonia have a ‘blur’ image on their x-ray than the normal patient. You can see the detail in figure 8 and figure 9.

Part #03: Data Preprocessing

Well, we got a few insight from our dataset, next, we will build the training data and testing data. We will use vstack to combine NORMAL and PNEUMONIA. So, for the training data, x_train will contain the images from subfolder NORMAL and PNEUMONIA from ‘train’ folder. And, x_test will contain the images from subfolder NORMAL and PNEUMONIA from ‘test’ folder.

After you ran the code above, you will see the shape for training data become 5216 (1341 (normal) + 3875 (pneumonia)) and testing data become 624 (234 (normal) + 390 (pneumonia)). Then, we ‘shuffle’ the training data and testing data, then we split again for x and y. For the detail you can see the figure 12 below.

figure 12. shuffle training data & testing data

Next, we will use ImageDataGenerator to do some modification on our images. ImageDataGenerator is a library from “Keras” which provide easy way for image modification. Some technique you can do are standarization, rotation, shifts, flips, brightness change, etc. We use these technique to create more variation of our images. If you realize, the dataset that we have mostly in front view and upright position. So, what happen when we have the dataset with different view? Do the machine learning well? That’s why we use ImageDataGenerator to make the algorithm can learn from any positions. In this project, i use rotation, zoom, and shift.

There is no absolute rules to configure the value of each functions (line 2–5). So, you can customize it then we fit the ‘datagen’.

figure 14. fit the datagen

Part #04: Create Model

Then we move to the modelling part. The first step si define the input. Because in the begining we resize the images to 200x200 pixel, so we will do the same for the input shape.

figure 15. input definition

Next, we define the convolutional layer. before we go to the code, I will explain how the Convolutional Neural Network (CNN) work. CNN is one of the AI method used for visual and images dataset. Generally, CNN used to detect an object on an image. CNN contain convolutional layer, activation function, pooling layer and fully connected layer. CNN use the convolution process to run the specific kernel (filter) on an image, then computer got new representative information from the multiplication result from the pieces of image within the filter.

Okay, then back to the code, we can create the convolutional layer by type the code below.

We will use relu activation because this activation will create a border on 0. That mean, if x ≤ 0 then x = 0 and if x > 0 than x = x. From figure 17, line 1 explain that size of the convolutional layer is 3x3 and applied into the 200x200 images that have 3 colours channels (RGB) and will get output layer with 16 channels. the second line have more channels (32 channels). We start with small size because we begin with the general patterns. After that we use bigger size to extract the patterns to get more details.

The pooling layer (line 3) we use to reduce the input by decrease the parameter. Generally, this method that we use are max pooling. But it’s possible to use another method like average pooling or L2-norm pooling.

Next, we create Flatten layer to convert the multidimensional images to 1 dimension array before we continue to fully-connected layer (Dense()).

The dense layer will begin from 100 neuron and then decreased to 50 neuron. Again, there is no absolute rules to configure the neuron, but normally we start it from a large number of neurons and then decreases so on. The output have 1 neuron and use sigmoid as an activation. Different with relu, sigmoid will receive single number and change the value of x into a value between range 0 and 1.

We set 0.5 for dropout rate which mean if the output value higher than 0.5, the value will be rounded into 1. For this case, it will be categorized as pneumonia. Otherwise, if the output lower than 0.5 will be rounded into 0 and categorized as normal.

Next, we build the model and determine the input and output. Don’t forget to show the summary to understand our model.

figure 21. create model

You can see the result on figure 22 below. Our model have 32 million param. As we have much param, it’s recommended to use Kaggle or Google Colab to use their GPU.

Then, we compile our model before train it. We use binary_crossentropy as loss function because we just classified to binary (0 or 1). Then, we will use an optimizer and focus on accuracy.

figure 23. model compile

After we compile it, we can make a checkpoint to save our best model from the training result. The checkpoint also can reduce your time for training. Sometimes, when we repeat the training process, the result is inconsistent. So, to avoid that, it’s better to save our best model. Type the script below to create it.

figure 24. checkpoint

The next step we will move into training process. I use 30 times of epochs and set the batch_size to 32. No absolute rules for the epochs, so you can customize by yourself.

If you don’t use GPU on this process, it will spent much time, so please be patient. Once it done, you can type the code on figure 26 to save the result.

figure 26. save model

Part #05: Evaluate Model

Finally, we made it into last part. After we done with the training step, we can continue to evaluate our model to check their performance. First, we can see the loss and accuracy from all epochs.

Based on figure 28, we can see the loss score on validation data already low from the beginning. It is contrary with the accuracy score. As we see on figure 30, the accuracy score vary widely from epoch 1 to 30. The highest point is 0.83 and the lowest point are 0.77. Next we can make sure the amount of predicted data by using confusion matrix. Before we do it, we can callback the best model by type the code below.

figure 31. Load model

Then we can continue to create a function to call the confusion matrix and show the result for training data and validation data.

figure 33. confusion matrix of training data

Confusion matrix is arranged by 4 component. There are true positive (TP), true negative (TN), false positive (FP) dan false negative (FN). We can say data is true positive when the actual positive data also predicted as positive. In this case, if the actual data said that the patient have pneumonia and model predict the patient have a pneumonia too, it called true positive. Contrast with true positive, if the actual data is pneumonia and the model predict it as normal, it called false negative. Then, true negative is the negative data that predicted negative by the model. When we say patient is normal and model also predict the same. Last, when the negative data predicted as positive by the model, it called false positive.

figure 34. The concept of confusion matrix

We can calculate the evaluation metrics based on the result of confusion matrix. We can use the formula below to calculate it.

Accuracy = (TP+TN)/(TP+FP+TN+FN)
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
F1 Score = 2x (Precision x Recall)/(Precision+Recall)

· Accuracy: measure how accurate the model to predict the data.

· Precision: measure the accuracy between requested data with prediction result by the model.

· Recall: measure how success the model to callback the information

· F1 Score: measure the mean value of precision and recall.

Accuracy is reccomended to use as a performance measurement when we have very close amount of false negative and false positive. But if not close, we can use F1 Score as a reference.

Based on figure 34, we got result like this:

Accuracy : 0.92

Precision : 0.95

Recall : 0.94

F1 Score : 0.94

Looks like we have high score of evaluation metric on training data. Next, we will do the same for testing data.

figure 35. confusion matrix on testing data

Accuracy : 0.77

Precision : 0.75

Recall : 0.95

F1 Score : 0.83

If we look the result, we know that the score for each parameter are not so close. In this case, we can use the accuracy score since the amount of normal and pneumonia are imbalance. So that, we can focus into F1 Score. We can see that F1 Score for the testing data is 0.83. The score is good enough for this case.

Then we move into the last step of this project. We will try to predict the data in val folder. We can load the dataset first by follow this code.

figure 36. load data in folder val

Then we predict the data.

figure 37. data val prediction

Next, we visualize the result using code below. The data that we visualize is a data from subfolder Normal and Pneumonia.

figure 39. prediction result from pneumonia subfolder

Based on result from figure 38 and 39, the conclusion that we can get is sometimes the model still failure to predict the normal data (as you can see on figure 38, some of normal data predicted as pneumonia (1)). Otherwise, on pneumonia data, model success to predict correctly all of the data. If you remember, in the beginning we create a hypothesis that the x-ray of pneumonia look blur than the normal one, so, we can say that our model success to predict the patient that indicate to pneumonia although still got minor error.