Background Morbidity and mortality from cervical cancer is significantly reduced if a cancerous lesion is discovered in its early stages. Women who live in areas of limited resources can benefit from a single encounter in which screening, diagnosing and initiating treatment for cervical cancer are done. If an expert gynecologist is not available for these women, machine learning algorithms can be useful and literally save lives.
Malignant lesions usually arise in the transformation zone of the cervix. This is a circular area in which squamous epithelium transforms into columnar epithelium. When assessing the cervix visually, it is important to determine whether the transformation zone is visible. If a part (or all) of the transformation zone is hidden, visual inspection might not be enough for cervical screening and invasive diagnostic procedures can be taken. The goal of our study is to develop an algorithm which can determine the visibility of the transformation zone from a single image taken with a smartphone. This problem was presented in a data science competition in Kaggle, an online data science competitions website. Machine Learning Algorithm and Classification
Genetic algorithms have been used for medical imaging analysis in the past. Genetic algorithms are problem solving algorithms inspired by the process of natural selection. Usually, a large number of solutions to a problem is randomly created. The best solutions are then selected, mutated and mixed to create a new generation of solutions. This process is repeated and the generations “evolve” until there is an optimal solution to the problem.

Images for training of the algorithm were taken from the competition website and are publicly available online. The images were taken with smartphones mounted by a lens created by MobileOTD. There are approximately 10,000 images classified for training. Programming of the algorithm is done with Python.
We chose to use a genetic algorithm for image analysis. We believe that if the algorithm can identify three anatomical landmarks correctly on the image, classification is relatively easy. These landmarks are: 1) The os – the opening of the uterus. 2) The endocervix – the area of uterine columnar epithelium surrounding the os. 3) The vaginal area in the image without additional background. When the anatomical landmarks are correctly identified we will use a clustering algorithm such as K-Means or Support Vector Machines to classify the images into three categories: Type 1 – Transformation zone completely visible, Type 2 – Transformation zone partially visible, Type 3 – Transformation zone completely hidden.

Initial Results
So far, we have built a genetic algorithm that marks the anatomical landmarks mentioned earlier. It is still not perfect and we are now working on optimizing the solutions it produces. After finishing the algorithm and image classification we would compare our results with the results of the 848 teams that competed online. We hope that our work could contribute to cervical cancer screening in remote areas, and we find the challenge of an online competition to be invigorating.


Author: Guy Zahavi

Coauthor(s): Guy Hachmon MSc, Ronen Tal-Botzer PhD.

Status: Work In Progress

Funding Acknowledgment: No external fundings were received for this work.