Deep learning-based catheter segmentation in fluoroscopic videos from head and neck cancer patients

Head and Neck cancer patients often develop issues with swallowing. There are various diagnostic methods to examine this issue, but a combination of two known techniques, Videofluoroscopic Swallow studies (VFSS) and High Resolution Impedance Manometry is especially useful. To combine the techniques, however, a method must be found to integrate the data with each other. The first step of this is the detection of the HRIM catheter in VFSS footage.

In this project, this challenge is tackled through image segmentation using a Deep Learning approach. The study aims to develop a model capable of consistent and accurate segmentation of the catheter, with a secondary goal of gaining insight into what model and training characteristics are most beneficial for this task.

More than 30 models with varying architectures and hyperparameters were developed for analysis. The primary hyperparameters explored included the optimiser, loss function, and learning rate. In addition, the study examined the employment of data augmentations. Finally, the performance of two key architectures were compared: the Base ResUNet and Attention ResUNet. This culminated in a final model achieving an average Dice score exceeding 87% on evaluation sets.