AI Applications in Healthcare: Breast Cancer Prediction

By
Luigi Vacca
on
October 11, 2023

Breast Cancer

Breast cancer is a serious disease that affects mostly women, but also men.

In 2020, it was estimated by the World Health Organization (WHO)

that there were approximately 685,000 breast cancer-related deaths globally.

Hence, breast cancer is of the major causes of death among women.

The stage at which breast cancer is diagnosed is a crucial factor in determining its seriousness. Early-stage breast cancer, such as stage 0  and stage I, are typically more treatable and associated with a higher chance of survival. In contrast, advanced stages, such as stage III and IV, are generally more challenging to treat and have a lower survival rate.

There are different types of breast cancer, including invasive ductal carcinoma, invasive lobular carcinoma, and others. The specific type of breast cancer can impact treatment options and outcomes.

The size of the tumor, whether it has spread to nearby lymph nodes, and whether it has metastasized (spread to other parts of the body) are critical factors in assessing the seriousness of breast cancer.

Advances in medical research and treatment options have improved the prognosis for many breast cancer patients. Treatments may include surgery, chemotherapy, radiation therapy, hormone therapy, targeted therapy, and immunotherapy. The choice of treatment depends on the type and stage of breast cancer.

Breast cancer is treatable. Early detection through regular breast self-exams, clinical breast exams, and mammograms can significantly improve the chances of successful treatment and survival.

Hence, the ability of an algorithm to predict accurately the positive cases is of paramount importance.

Mammograms

A mammogram is an X-ray examination of the breast. It is used to diagnose breast cancer in women who either have breast problems, such as a lump, pain, or nipple discharge, as well as for women who do periodic check-ups.

AI Cancer Prediction

The initial step is to acquire the largest possible dataset of mammogram X-ray  or magnetic resonance digitized images. Here, the presence of positive samples ( with one with proven cancer) and of negative samples ( for which no cancer or malignant cancer was found) is of paramount importance.

Hence, the database ready for machine learning  have some labels that are malignant and some that are  benign or have no form of cancer.

In general, the dataset from the database is randomly divided into  training and testing subsets, or in 3 sets: training, validation and test sets with the training set varying from 60 percent to 80 percent of the total dataset. This may be also a function of how many samples are available. Validation and test sets are similar sizes.

The problem is simply the problem of binary classification. Either the sample is positive for malignant cancer or it is not (negative).

A digitized image of a mass in breast is used to extract features. It may have features like the history of a patient from a clinical standpoint, their demographic information, genetic information , previous diagnoses, all characteristics of the area in question, its size, its fractalness, its perimeter etc.

Performance of Model

Here are the 4 possible outcomes of a binary classifier: TP for true positive, FP for false positive, FN for false negative and TN for true negative.

For breast cancer prediction, the most important is the FN, false negative. These are the cases when we have malignant cases that are predicted as negatives.

There are several measures of performance as a function of the 4 outcomes:

Precision and Recall are defined as follows:

Precision = TP/ (TP+FP) where we count all the TPs and FPs in our test sample.

Recall = TP / (TP+FN) as above. Recall is an important measure of performance when the outcomes are particularly negative as in the case of breast cancer.

Accuracy = (TP+TN) / (TP+TN+FP+FN)

There is another important measure of performance called the AUC ( or area under the curve) of a receiver operating characteristic also known as ROC.

AI Models

A series of classifiers were and are used to predict breast cancer. Here we mention a few:

Nearest Neighbor Classifier.

Naïve Bayes Classifier.

Support Vector Machine Classifier.

Adaboost and XGBoost as ensemble techniques classifiers.

Convolutional Neural Networks (CNN) are considered among the state of the art in computer vision.

CNN’s are part of machine learning and considered as components of the deep learning family of algorithms. They have input layers, hidden layers and an output layer which in our case it tells us whether the input is positive or negative.

In particular, convolutional networks perform convolutions as the name suggests. In substance, this is shifting some low dimensional feature across a multidimensional image to find which region mostly resembles the low dimensional feature.

According to the article by Jafari, Zahra, and Ebrahim Karami. 2023. "Breast Cancer Detection in Mammography Images: A CNN-Based Approach with Feature Selection" Information 14, no. 7

A multi-layer CNN can classify magnetic resonance images as malignant or benign tumors using pixel information and online data augmentation. The highest accuracy achieved was about 98%.

Based on these results, the use of machine learning ( a branch of AI) can play a tremendous role in predicting and diagnosing breast cancer in images.

Latest articles

Take care of yourself and your family with confidence

Live freely with the knowledge that you’re keeping yourself and your family on a path to better health