From Competition to Collaboration: Making Toy Datasets on Kaggle Clinically Useful for Chest X-Ray Diagnosis Using Federated Learning

Chest X-ray (CXR) datasets hosted on Kaggle, though useful from a data science competition standpoint, have limited utility in clinical use because of their narrow focus on diagnosing one specific disease. In real-world clinical use, multiple diseases need to be considered since they can co-exist in the same patient.

In this work, we demonstrate how federated learning (FL) can be used to make these toy CXR datasets from Kaggle clinically useful. We propose CheXViz, a federated learning (FL) framework for training a single model on spatially distributed datasets with different disease annotations into a ‘global‘ meta-deep learning model.

Illustration of the CheXViz framework

We developed CheXViz using a multi-task FL setup. The CheXViz model is initialized as a deep neural network consisting of two distinct blocks - a representation block and a task block. The CheXViz model is distributed across all the participating nodes to train their tasks. During training, only the weights corresponding to the representation block are aggregated and redistributed by the central server back to the nodes, thereby preserving task-related information for each node in their task block.

In this preliminary work, we demonstrate the utility of CheXViz for training a single model to diagnose pneumonia and pneumothorax using two toy datasets from Kaggle for these two respective diseases. Put another way, we demonstrate how toy datasets from Kaggle can be made clinically useful using FL.

Specifically, we train a single FL classification model (‘global‘) using two separate CXR datasets – one annotated for presence of pneumonia and the other for presence of pneumothorax (two common and life-threatening conditions) – capable of diagnosing both. We compare the performance of the global FL model with models trained separately on both datasets (‘baseline‘) for two different model architectures. On a standard, naive 3-layer CNN architecture, the global FL model achieved AUROC of 0.84 and 0.81 for pneumonia and pneumothorax, respectively, compared to 0.85 and 0.82, respectively, for both baseline models (p>0.05). Similarly, on a pretrained DenseNet121 architecture, the global FL model achieved AUROC of 0.88 and 0.91 for pneumonia and pneumothorax, respectively, compared to 0.89 and 0.91, respectively, for both baseline models (p>0.05).

ROC curves obtained from baseline and CheXViz models evaluated across both the datasets.

Our findings demonstrate that our FL framework (CheXViz) can be used to create global ‘meta‘ models to make toy datasets from Kaggle clinically useful, a large step forward towards bridging the gap from bench to bedside. Although preliminary in nature and focusing on only two datasets, our framework and results are extensible to any number of datasets and disease labels, as well as tasks beyond classification (e.g., segmentation and object detection). It is our hope that our work can be a first step towards moving Kaggle CXR datasets from competition to collaboration and transform these toy datasets into clinically useful models.

The paper was authored by Pranav Kulkarni, Adway Kanhere, Paul H. Yi, and Vishwa S. Parekh and accepted for the MedNeurIPS 2022 workshop. It is now also publicly available on arXiv.

Previous
Previous

UM2ii receives the UMMC Innovation Award of $125,000 to develop fastMRI using Deep Learning

Next
Next

R&E Foundation Grant Recipients to Present Research at RSNA 2022